Minority and endangered languages on the web

By Marie Lebert, 3 July 2019.

[French version]
[Spanish version]

Minority languages on the web

Guy Antoine, a Haitian-American software developer born in Haiti and living in New York, founded the website Windows on Haiti in April 1998 on his free time to promote Haitian Creole, a French-based creole language spoken not only in Haiti but also in the Dominican Republic, the United States, Canada, and other countries.

Guy Antoine wrote in June 2001 in an email interview: “Who are the Haitian people without Kreyól [Haitian Creole], the language that has evolved and bound various African tribes transplanted in Haiti during the slavery period? It is the most palpable exponent of commonality that defines us as a people. However, it is primarily a spoken language, not a widely written one. I see the web changing this situation more than any traditional means of language dissemination. Our site aims to be a major source of information about Haitian culture, and a tool to counter the persistently negative images of Haiti from the traditional media. The scope of this effort extends beyond mere commentary to the diversity of arts and history, cuisine and music, literature and reminiscences of traditional Haitian life. In short, the site opens some new windows to the culture of Haiti.”

“The primary language of Windows on Haiti is English, but one will equally find a center of lively discussion conducted in Kreyól. In addition, one will find documents related to Haiti in French, in the old colonial Creole, and I am open to publishing others in Spanish and other languages. I do not offer any sort of translation, but multilingualism is alive and well at the site, and I predict that this will increasingly become the norm throughout the web. Kreyól is the only national language of Haiti, and one of its two official languages, the other being French. It is hardly a minority language in the Caribbean context, since it is spoken by eight to ten million people. I have created two discussion forums on my website, held exclusively in Kreyól. One is for general discussions just about everything but obviously more focused on Haiti’s current socio-political problems. The other is reserved only to debates of writing standards for Kreyol. Those debates have been quite spirited with the participation of a number of linguistic experts. The uniqueness of these forums is their non-academic nature.”

According to some linguists, a language dies every 14 days. A good way to counter it is to bring together language communities via the internet, to help revitalize these languages through digital technology, and to strengthen the presence of language communities in social media.

Kevin Scannell, a computer scientist and professor at Saint Louis University in Missouri, United States, created Indigenous Tweets on his free time to identify tweets in indigenous and minority languages. He designed An Crúbadán, a statistical software crawling the web to find Twitter threads. Indigenous Tweets identified 35 languages in March 2011, 71 languages in April 2011, 144 languages in March 2013, and 184 languages in October 2017.

Indigenous Tweets’ home page lists all languages identified as being active on Twitter. People click on the corresponding row of the language they are interested in, and are redirected to a new page that lists users in that language (with a maximum of 500 users) and statistics for each user: number of tweets, number of followers, percentage of tweets in the given language (some users tweet in both a global language and a minority language), and date of the latest tweet. The main minority languages were Haitian Creole (with users from the Caribbean, North America and other places), Basque and Welsh. People can also get in touch directly via Twitter. A number of joint projects started this way.

As explained by Kevin Scannell on his blog: “Speakers of indigenous and minority languages around the world are struggling to keep their languages and cultures alive. More and more language groups are turning to the web as a tool for language revitalization, and as a result there are now thousands of people blogging and using social media sites like Facebook and Twitter in their native language. These sites have allowed sometimes scattered communities to connect and use their languages online in a natural way. Social media have also been important in engaging young people, who are the most important demographic in language revitalization efforts. Together we’re breaking down the idea that only global languages like English and French have a place online!”

“The primary aim of Indigenous Tweets is to help build online language communities through Twitter. We hope that the site makes it easier for speakers of indigenous and minority languages to find each other in the vast sea of English, French, Spanish, and other global languages that dominate Twitter. Even speakers of languages like Basque and Welsh with vibrant online communities have been surprised to find just how many people there are tweeting in their language. This is the other goal of Indigenous Tweets: it’s a message to the world that says ‘We are here and we’re proud of our languages’. For languages with just a few users, I hope it inspires some people to start — make your voice heard!”

Kevin Scannell created a second website, Indigenous Blogs, in September 2011 to identify blogs written in indigenous and minority languages, and to offer a similar platform for people to get in touch. He began with blogs hosted by Blogspot (which also hosts his own blog), WordPress and Tumblr. Indigenous Blogs identified blogs in 50 languages in September 2011, blogs in 74 languages in March 2013, and blogs in 85 languages in October 2017.

Endangered languages on the web

UNESCO (United Nations Educational, Scientific and Cultural Organization) launched its free trilingual online “Atlas of the World’s Languages in Danger” as a complement to the printed trilingual (English, French, Spanish) edition (3rd edition, 2010) edited by linguist Christopher Moseley. Previous editions of the atlas in 1996 and 2001 only existed in print.

The online atlas included 2,473 languages in 2010, and 2,464 languages in 2017. It can be searched by country and area, language name, number of speakers from/to, language vitality, and ISO 639-3 code. The names of the 2,464 languages are transcribed into English, French and Spanish, and their alternate names (spelling variants, dialects and names in non-Roman scripts) are also provided.

UNESCO experts have established six degrees (safe, vulnerable, definitely endangered, severely endangered, critically endangered, extinct) to define the vitality or endangerment of a language. (1) “Safe” — not included in the atlas — means that the language is spoken by all generations and that intergenerational transmission is uninterrupted. (2) “Vulnerable” means that most children speak the language, but it may be restricted to certain places, for example at home. (3) “Definitely endangered” means that children no longer learn the language as a mother tongue at home. (4) “Severely endangered” means that the language is spoken by grandparents and older generations; the parent generation may understand it, but doesn’t use it with their children or among themselves. (5) “Critically endangered” means that the youngest speakers are grandparents and older, who speak the language partially and infrequently. (6) “Extinct” means that there are no speakers left; the atlas includes languages that are presumably extinct since the 1950s.

When exactly is a language considered endangered? As explained on the website of the atlas: “A language is endangered when its speakers cease to use it, use it in fewer and fewer domains, use fewer of its registers and speaking styles, and/or stop passing it on to the next generation. No single factor determines whether a language is endangered.” UNESCO experts have identified nine factors to be considered: (1) intergenerational language transmission; (2) absolute number of speakers; (3) proportion of speakers within the total population; (4) shifts in domains of language use; (5) response to new domains and media; (6) availability of materials for language education and literacy; (7) governmental and institutional language attitudes and policies including official status and use; (8) attitudes of community members towards their own language; (9) amount and quality of documentation.

When and why do languages disappear? “A language disappears when its speakers disappear or when they shift to speaking another language — most often, a larger language used by a more powerful group. Languages are threatened by external forces such as military, economic, religious, cultural or educational subjugation, or by internal forces such as a community’s negative attitude towards its own language. Today, increased migration and rapid urbanization often bring along the loss of traditional ways of life and a strong pressure to speak a dominant language that is — or is perceived to be — necessary for full civic participation and economic advancement.”

The “Atlas of the World’s Languages in Danger” classifies Gaelic as a “definitely endangered” language. There were 59,000 Gaelic speakers (over 1 percent of the population), according to the 2011 census. These figures were much lower than the 200,000 Gaelic speakers (4.5 percent of the population) in the 1901 census.

This has not always been the case. For many centuries, everyone spoke Gaelic in Scotland and Ireland, and scholars disseminated their writings in Gaelic throughout Europe. Over the centuries, English gradually became the dominant language, including on the Scottish Western Isles, despite the presence of Scottish Gaelic as the first community language. The revival of Gaelic culture dates back to the early 19th century, in the form of poetry, prose and music. Between the two world wars, a radio channel began broadcasting the news in Gaelic, and Gaelic started being learned again in schools. Today, more novels are published in Gaelic that at any other time. Radio nan Gàidheal has broadcasted in Gaelic since the 1980s, and the TV channel ALBA offers shows in Gaelic since the early 2000s. Both have a web presence, which has boosted their audience.

A freelance translator from English to Scottish Gaelic, Michael Bauer has worked on several localization projects on his free time, “just for the love of it”, with a fellow localizer who on the web only goes by GunChleoc (“a woman” in Scottish Gaelic), a proof that few people can do a lot for their language community. The localization projects included the Gaelic versions of the web browser Opera (in 2001), Firefox (Mozilla web browser), Thunderbird (Mozilla messaging), Lightning (Mozilla calendar), Google Chrome, OpenOffice, LibreOffice, the VLC media player, the game Freeciv (the open source version of the game Civilization), and Accentuate.us (a software that automatically inserted accents). Michael Bauer also created the spell checker An Dearbhair Beag with Kevin Scannell. Since 2012, he has worked on a few paid projects with GunChleoc, for example the Gaelic language packs for Microsoft Windows and Microsoft Office.

Wikipedia has its Gaelic version, named Uicipeid. There are three major online dictionaries in Scottish Gaelic. The first dictionary is Stòr-dàta, an online dictionary that is mostly a word list managed by the college Sabhal Mòr Ostaig, on the Isle of Skye, a college where all the courses are taught in Scottish Gaelic. The second dictionary is the Dwelly, a Gaelic dictionary published in 1911, which is to Gaelic what the Oxford English Dictionary is to English. Its digital version is the result of a ten-year labour of love by Michael Bauer and his colleague Will Robertson. The third dictionary is Am Faclair Beag, which means “small dictionary” but is actually a large dictionary offering both the Dwelly and more modern data, also created and maintained by Michael Bauer and Will Robertson.

Michael Bauer wrote in October 2015 in an email interview: “There are, sadly, far too few users and there are some aspects which actually actively limit usage. For example, Gaelic schools cannot install Gaelic software because the IT contracts are given by the councils to outside IT companies who only provide English software and operating systems. Because they limit the admin rights of the users at schools, this means it is very difficult to install software which is not on their official ‘list’ and because Gaelic is not mentioned in the contract, they don’t put it there. Free and open software has helped carve out more of a space on the web for Gaelic, and cooperating with commercial long-term partners is helping to produce some very useful enabling technologies such as the predictive texting tool Adaptxt or the upcoming text-to-speech tool with Cereproc.”

“A central storage space for translations would be useful for localization projects, with a shared translation memory, thus avoiding to endlessly retranslate the same terms, phrases and sentence segments. If the translations could be available from the same site, like a meta-Pootle [a community localization server], everyone working for the revival of a minority language on the web would benefit from it. There actually was/is something a bit like that, Ubuntu’s Launchpad, but unfortunately there is not enough coordination between Launchpad and the projects and much effort is going to waste by people working on Launchpad and the translations not going anywhere. There is also AmaGama these days which is something like that but not commonly used apart from some like Mozilla and LibreOffice (I think). Part of the problem is there are so many platforms these days, all trying to carve out a niche… some of them commercial, like Transifex or Crowdin.”

What is the best way to help language revitalization efforts? As an example, to celebrate International Dictionary Day on 16 October 2015, 15 new and revised South-African indigenous language dictionaries were published online during the following weeks. Some indigenous languages still need language dictionaries, grammars and glossaries. Other indigenous languages even need basic language technologies such as keyboard settings or spell checkers.

Copyright © 2017-19 Marie Lebert
License CC BY-NC-SA version 4.0

Written by marielebert

2017-10-24 at 18:01

Posted in Uncategorized