Minority and endangered languages on the web

By Marie Lebert, 3 December 2019.

[French version]
[Spanish version]

Please see also:
* The web, from English to multilingual
* The multilingual web — timeline

Ethnologue: Languages of the World

“Ethnologue: Languages of the World”, a reference catalogue published in print by SIL International, launched its free online version in 1996, with a full description of the 6,700 living languages spoken in 228 countries.

As explained in January 2000 by Barbara Grimes, editor of the Ethnologue since 1971, in an email interview: “The Ethnologue is a catalogue of the languages of the world, with information about where they are spoken, an estimate of the number of speakers, what language family they are in, alternate names, names of dialects, other socio-linguistic and demographic information, dates of published Bibles, a name index [the Ethnologue Name Index], a language family index [the Ethnologue Language Index], and language maps. (…) We have had requests for the Ethnologue in a few other languages, but we do not have the personnel or funds to do the translation or maintenance, since it is constantly being updated.”

The Ethnologue was founded in 1951 by Richard Pittman as a catalogue of minority languages, to share information on language development needs with his colleagues at SIL International (formerly known as the Summer Institute of Linguistics) and with other language researchers around the globe. Information was expanded from minority languages to include all known languages of the world in 1971, with the help of thousands of linguists from partner organisations. Barbara Grimes supervised in 1967-73 an in-depth revision of the information available for Africa, the Americas, the Pacific and part of Asia. The number of identified languages grew from 4,493 to 6,809, with more information recorded on each language in the computer database created in 1971.

What exactly is a language? According to the website of the Ethnologue: “How one chooses to define a language depends on the purposes one has in identifying one language as being distinct from another. Some base their definition on purely linguistic grounds, focusing on lexical and grammatical differences. Others may see social, cultural, or political factors as being primary. In addition, speakers themselves often have their own perspectives on what makes a particular language uniquely theirs. Those are frequently related to issues of heritage and identity much more than to the actual linguistic features. In addition, it is important to recognise that not all languages are oral. Sign languages constitute an important class of linguistic varieties that merit consideration.”

A new edition of the Ethnologue was published approximately every four years until 2012. Since 2013, the Ethnologue has been published every year, with the online version available before the printed version, to keep up with the fast pace of the internet. The Ethnologue identified 6,909 living languages in 2009, 7,102 living languages in 2015, and 7,099 living languages in 2017. A paid subscription model for extensive users in high income countries was created in 2016 in order to sustain the Ethnologue project.

At the invitation of the International Organisation for Standardisation (ISO) in 2002, SIL International worked on the new standard ISO 639-3 (2007), that reconciled the complete set of three-letter identifiers used in the Ethnologue since the inception of its database in 1971 with the 400 three-letter codes used in the previous standard ISO 639-2 (1998), as well as other three-letter codes developed by the Linguist List for ancient and constructed languages. (The first ISO standard to identify languages was ISO 639-1 (1988) as a set of two-letter language codes.)

Approved in 2006 and published in 2007, ISO 639-3 (2007) has provided three-letter codes for identifying 7,589 languages (living and extinct, ancient and reconstructed, major and minor, written and unwritten), including sign languages. SIL International was named the registration authority for the language identifier inventory, and has administered the annual cycle of changes and updates since then.

Minority languages on the web

Guy Antoine, a Haitian-American software developer born in Haiti and living in New York, founded the website Windows on Haiti in April 1998 on his free time to promote Haitian Creole, a French-based creole language spoken not only in Haiti but also in the Dominican Republic, the United States, Canada, and other countries.

Guy Antoine wrote in June 2001 in an email interview: “Who are the Haitian people without Kreyól [Haitian Creole], the language that has evolved and bound various African tribes transplanted in Haiti during the slavery period? It is the most palpable exponent of commonality that defines us as a people. However, it is primarily a spoken language, not a widely written one. I see the web changing this situation more than any traditional means of language dissemination. Our site aims to be a major source of information about Haitian culture, and a tool to counter the persistently negative images of Haiti from the traditional media. The scope of this effort extends beyond mere commentary to the diversity of arts and history, cuisine and music, literature and reminiscences of traditional Haitian life. In short, the site opens some new windows to the culture of Haiti.”

“The primary language of Windows on Haiti is English, but one will equally find a center of lively discussion conducted in Kreyól. In addition, one will find documents related to Haiti in French, in the old colonial Creole, and I am open to publishing others in Spanish and other languages. I do not offer any sort of translation, but multilingualism is alive and well at the site, and I predict that this will increasingly become the norm throughout the web. Kreyól is the only national language of Haiti, and one of its two official languages, the other being French. It is hardly a minority language in the Caribbean context, since it is spoken by eight to ten million people. I have created two discussion forums on my website, held exclusively in Kreyól. One is for general discussions just about everything but obviously more focused on Haiti’s current sociopolitical problems. The other is reserved only to debates of writing standards for Kreyól. Those debates have been quite spirited with the participation of a number of linguistic experts. The uniqueness of these forums is their non-academic nature.”

According to some linguists, a language dies every 14 days. A good way to counter it is to bring together language communities via the internet, to help revitalise these languages through digital technology, and to strengthen the presence of language communities in social media.

Kevin Scannell, a computer scientist and professor at Saint Louis University, Missouri, created the website Indigenous Tweets on his free time to identify tweets in indigenous and minority languages. He designed An Crúbadán, a statistical software crawling the web to find Twitter threads. Indigenous Tweets identified 35 languages in March 2011, 71 languages in April 2011, 144 languages in March 2013, and 184 languages in October 2017.

Indigenous Tweets’ home page lists all languages identified as being active on Twitter. People click on the corresponding row of the language they are interested in, and are redirected to a new page that lists users in that language (with a maximum of 500 users) and statistics for each user: number of tweets, number of followers, percentage of tweets in the given language (some users tweet in both a global language and a minority language), and date of the latest tweet. The main minority languages are Haitian Creole (with users from the Caribbean, North America and other places), Basque and Welsh. People can also get in touch directly via Twitter. A number of joint projects started this way.

As explained by Kevin Scannell on his blog: “Together we’re breaking down the idea that only global languages like English and French have a place online! The primary aim of Indigenous Tweets is to help build online language communities through Twitter. We hope that the site makes it easier for speakers of indigenous and minority languages to find each other in the vast sea of English, French, Spanish, and other global languages that dominate Twitter. Even speakers of languages like Basque and Welsh with vibrant online communities have been surprised to find just how many people there are tweeting in their language. This is the other goal of Indigenous Tweets: it’s a message to the world that says ‘We are here and we’re proud of our languages’. For languages with just a few users, I hope it inspires some people to start — make your voice heard!”

Kevin Scannell created a second website, Indigenous Blogs, in September 2011 to identify blogs written in indigenous and minority languages, and to offer a similar platform for people to get in touch. He began with blogs hosted by Blogspot (which also hosts his own blog), WordPress and Tumblr. Indigenous Blogs identified blogs in 50 languages in September 2011, in 74 languages in March 2013, and in 85 languages in October 2017.

Endangered languages on the web

UNESCO (United Nations Educational, Scientific and Cultural Organisation) published in 2010 the free online version of its “Atlas of the World’s Languages in Danger” alongside its printed edition (3rd edition, 2010), edited by Christopher Moseley. The previous editions of the atlas in 1996 and 2001 only existed in print.

Available in three languages (English, French, Spanish) like the printed edition, the online atlas included 2,473 languages in 2010 and 2,464 languages in 2017. It can be searched by country and area, language name, number of speakers from/to, language vitality, and ISO 639-3 code. The alternate names of the languages (spelling variants, dialects, names in non-Roman scripts) are also provided.

UNESCO experts have established six degrees (safe, vulnerable, definitely endangered, severely endangered, critically endangered, extinct) to define the vitality or endangerment of a language. (1) “Safe” — not included in the atlas — means that the language is spoken by all generations and that inter-generational transmission is uninterrupted. (2) “Vulnerable” means that most children speak the language, but it may be restricted to certain places, for example at home. (3) “Definitely endangered” means that children no longer learn the language as a native language at home. (4) “Severely endangered” means that the language is spoken by grandparents and older generations; the parent generation may understand it, but doesn’t use it with their children or among themselves. (5) “Critically endangered” means that the youngest speakers are grandparents and older, who speak the language partially and infrequently. (6) “Extinct” means that there are no speakers left; the atlas includes languages that are presumably extinct since the 1950s.

When exactly is a language considered endangered? As explained on the website of the atlas: “A language is endangered when its speakers cease to use it, use it in fewer and fewer domains, use fewer of its registers and speaking styles, and/or stop passing it on to the next generation. No single factor determines whether a language is endangered.” UNESCO experts have identified nine factors to be considered: (1) inter-generational language transmission; (2) absolute number of speakers; (3) proportion of speakers within the total population; (4) shifts in domains of language use; (5) response to new domains and media; (6) availability of materials for language education and literacy; (7) governmental and institutional language attitudes and policies including official status and use; (8) attitudes of community members towards their own language; (9) amount and quality of documentation.

When and why do languages disappear? “A language disappears when its speakers disappear or when they shift to speaking another language — most often, a larger language used by a more powerful group. Languages are threatened by external forces such as military, economic, religious, cultural or educational subjugation, or by internal forces such as a community’s negative attitude towards its own language. Today, increased migration and rapid urbanisation often bring along the loss of traditional ways of life and a strong pressure to speak a dominant language that is — or is perceived to be — necessary for full civic participation and economic advancement.”

The UNESCO atlas classifies Gaelic as a “definitely endangered” language. There were 59,000 Gaelic speakers (over 1 percent of the population), according to the 2011 census. These figures were much lower than the 200,000 Gaelic speakers (4.5 percent of the population) in the 1901 census.

This has not always been the case. For many centuries, everyone spoke Gaelic in Scotland and Ireland, and scholars disseminated their writings in Gaelic throughout Europe. Over the centuries, English gradually became the dominant language, including on the Scottish Western Isles, despite the presence of Scottish Gaelic as the first community language. The revival of Gaelic culture dates back to the early 19th century, in the form of poetry, prose and music. Between the two world wars, a radio channel began broadcasting the news in Gaelic, and Gaelic began being learned again in schools. Today, more novels are published in Gaelic that at any other time. Radio nan Gàidheal has broadcast in Gaelic since the 1980s, and the TV channel ALBA has offered shows in Gaelic since the early 2000s. Both have a web presence, which has boosted their audience. Wikipedia has its Gaelic version, named Uicipeid.

Michael Bauer, a freelance translator from English to Scottish Gaelic, has worked on several localisation projects on his free time, “just for the love of it”, with a fellow localiser who on the web only goes by GunChleoc (“a woman” in Scottish Gaelic), a proof that few people can do a lot for their language community. The localisation projects included the Gaelic versions of the web browser Opera (as early as 2001), Firefox (Mozilla web browser), Thunderbird (Mozilla messaging), Lightning (Mozilla calendar), Google Chrome, OpenOffice, LibreOffice, the VLC media player, the game Freeciv (the open source version of the game Civilisation), and Accentuate.us (a software inserting accents). Michael Bauer also created the spell checker An Dearbhair Beag with Kevin Scannell. Since 2012, he has worked on a few paid projects with GunChleoc, for example the Gaelic language packs for Microsoft Windows and Microsoft Office.

There are three online dictionaries in Scottish Gaelic. The first dictionary is Stòr-dàta, an online dictionary which is mostly a word list managed by Sabhal Mòr Ostaig, a college on the Isle of Skye where all the courses are taught in Scottish Gaelic. The second dictionary is the Dwelly, a Gaelic dictionary published in 1911, which is to Gaelic what the Oxford English Dictionary is to English. Its digital edition is the result of a ten-year labour of love by Michael Bauer and his colleague Will Robertson. The third dictionary is Am Faclair Beag, which means “small dictionary” but is actually a large dictionary offering both the Dwelly and more modern data, also created and maintained by Michael Bauer and Will Robertson.

Michael Bauer wrote in October 2015 in an email interview: “There are, sadly, far too few users and there are some aspects which actually actively limit usage. For example, Gaelic schools cannot install Gaelic software because the IT contracts are given by the councils to outside IT companies who only provide English software and operating systems. Because they limit the admin rights of the users at schools, this means it is very difficult to install software which is not on their official ‘list’ and because Gaelic is not mentioned in the contract, they don’t put it there. Free and open software has helped carve out more of a space on the web for Gaelic, and cooperating with commercial long-term partners is helping to produce some very useful enabling technologies such as the predictive texting tool Adaptxt or the upcoming text-to-speech tool with Cereproc.”

“A central storage space for translations would be useful for localisation projects, with a shared translation memory, thus avoiding to endlessly retranslate the same terms, phrases and sentence segments. If the translations could be available from the same site, like a meta-Pootle [a community localisation server], everyone working for the revival of a minority language on the web would benefit from it. There actually was/is something a bit like that, Ubuntu’s Launchpad, but unfortunately there is not enough coordination between Launchpad and the projects and much effort is going to waste by people working on Launchpad and the translations not going anywhere. There is also AmaGama these days which is something like that but not commonly used apart from some like Mozilla and LibreOffice (I think). Part of the problem is there are so many platforms these days, all trying to carve out a niche… some of them commercial, like Transifex or Crowdin.”

What is the best way to help language revitalisation efforts? Many minority, indigenous and endangered languages still need language dictionaries, grammars and glossaries. Some of them even need basic language technologies such as keyboard settings or spell checkers.

As explained on his blog by Kevin Scannell, founder of the websites Indigenous Tweets and Indigenous Blogs in 2011: “Speakers of indigenous and minority languages around the world are struggling to keep their languages and cultures alive. More and more language groups are turning to the web as a tool for language revitalisation, and as a result there are now thousands of people blogging and using social media sites like Facebook and Twitter in their native language. These sites have allowed sometimes scattered communities to connect and use their languages online in a natural way. Social media have also been important in engaging young people, who are the most important demographic in language revitalisation efforts.”

Copyright © 2017-19 Marie Lebert
License CC BY-NC-SA version 4.0

Written by marielebert

2017-10-24 at 18:01

Posted in Uncategorized