Archive for October 2017

Minority languages on the web

By Marie Lebert, October 2018.

“I do not see the web contributing to the loss of language identity and still suspect it may, in the long run, contribute to strengthening it. The web is an affordable boon for cultural expression.” (Robert Beard, co-founder of

[French version]
[Spanish version]

* The Ethnologue, a catalogue of all living languages
* An atlas of the world’s endangered languages
* First example: Haitian Creole
* Second example: Scottish Gaelic
* Third example: Indigenous Tweets
* There is still a lot to do

The Ethnologue, a catalogue of all living languages

The Ethnologue: Languages of the World — a reference catalogue published every four years since the 1950s — started its free online version in 1996, with a full description of the 6,700 living languages spoken in 228 countries. A paid subscription model for extensive users in high income countries was launched in 2016 in order to sustain the Ethnologue project. Published in February 2017, the 20th edition of the Ethnologue included 7,099 living languages.

As explained in January 2000 by Barbara Grimes, its editor from 1971 to 2000, in an email interview: “The Ethnologue is a catalogue of the languages of the world, with information about where they are spoken, an estimate of the number of speakers, what language family they are in, alternate names, names of dialects, other sociolinguistic and demographic information, dates of published Bibles, a name index, a language family index, and language maps. (…) Multilingual web pages are more widely useful, but much more costly to maintain. We have had requests for the Ethnologue in a few other languages, but we do not have the personnel or funds to do the translation or maintenance, since it is constantly being updated.”

What exactly is a language? According to the Ethnologue’s website: “How one chooses to define a language depends on the purposes one has in identifying one language as being distinct from another. Some base their definition on purely linguistic grounds, focusing on lexical and grammatical differences. Others may see social, cultural, or political factors as being primary. In addition, speakers themselves often have their own perspectives on what makes a particular language uniquely theirs. Those are frequently related to issues of heritage and identity much more than to the actual linguistic features. In addition, it is important to recognize that not all languages are oral. Sign languages constitute an important class of linguistic varieties that merit consideration.”

The Ethnologue was founded in 1951 by Richard Pittman as a catalog of minority languages, to share information on language development needs around the world with his colleagues at SIL International (formerly known as the Summer Institute of Linguistics) and other language researchers. In 1971, information was expanded from primarily minority languages to encompass all known languages of the world, with the help of thousands of linguists from partner organisations. Between 1967 and 1973, Barbara Grimes completed an in-depth revision of the information available for Africa, the Americas, the Pacific, and a few countries of Asia. The number of identified languages grew from 4,493 to 6,809, with more information recorded on each language in a computer database.

A new edition of the Ethnologue was published approximately every four years until 2012. Since 2013, the Ethnologue has been published every year, to keep up with the fast pace of the internet, and the online version is now available before the printed version. The Ethnologue identified 6,700 living languages in 1996, 6,909 living languages in 2009, 7,102 living languages in 2015, and 7,099 living languages in 2017.

At the invitation of the International Organization for Standardization (ISO) in 2002, SIL International worked for five years on the new standard ISO 639-3 (2007), that reconciled the complete set of codes used in the Ethnologue since 1971 with the 400 codes already in use in ISO 639-2 (1998), as well as other codes developed by the Linguist List for ancient and constructed languages. ISO 639-3 (2007) has provided three-letter codes for identifying 7,589 languages (living and extinct, ancient and reconstructed, major and minor, written and unwritten), including sign languages. SIL International was named the registration authority for the inventory of language identifiers, and administers the annual cycle for changes and updates.

An atlas of the world’s endangered languages

In 2010, UNESCO (United Nations Educational, Scientific and Cultural Organization) launched its free online Atlas of the World’s Languages in Danger, as a complement to the paid trilingual (English, French, Spanish) printed edition (3rd edition, 2010) edited by Christopher Moseley. Previous editions in 1996 and 2001 only existed in print, unlike the Ethnologue, that started offering a free web version in 1996.

The atlas included 2,473 languages in 2010, and 2,464 languages in July 2017. It can be searched by country and area, language name, number of speakers from/to, vitality and ISO 639-3 code. The names of the 2,464 languages are transcribed into English, French and Spanish, and their alternate names (spelling variants, dialects, and names in non-Roman scripts) are also provided.

UNESCO experts have established six degrees (safe, vulnerable, definitely endangered, severely endangered, critically endangered, extinct) to define the vitality or endangerment of a language. (1) “Safe” — not included in the atlas — means that the language is spoken by all generations and that intergenerational transmission is uninterrupted. (2) “Vulnerable” means that most children speak the language, but it may be restricted to certain places, for example at home. (3) “Definitely endangered” means that children no longer learn the language as a mother tongue at home. (4) “Severely endangered” means that the language is spoken by grandparents and older generations; the parent generation may understand it, but don’t use it with their children or among themselves. (5) “Critically endangered” means that the youngest speakers are grandparents and older, who speak the language partially and infrequently. (6) “Extinct” means there are no speakers left; the atlas includes languages that are presumably extinct since the 1950s.

When exactly is a language considered as endangered? As explained on the website: “A language is endangered when its speakers cease to use it, use it in fewer and fewer domains, use fewer of its registers and speaking styles, and/or stop passing it on to the next generation. No single factor determines whether a language is endangered.” UNESCO experts have identified nine factors to be considered: (1) intergenerational language transmission; (2) absolute number of speakers; (3) proportion of speakers within the total population; (4) shifts in domains of language use; (5) response to new domains and media; (6) availability of materials for language education and literacy; (7) governmental and institutional language attitudes and policies including official status and use; (8) community members’ attitudes towards their own language; (9) amount and quality of documentation.

When and why do languages disappear? As explained on the website: “A language disappears when its speakers disappear or when they shift to speaking another language — most often, a larger language used by a more powerful group. Languages are threatened by external forces such as military, economic, religious, cultural or educational subjugation, or by internal forces such as a community’s negative attitude towards its own language. Today, increased migration and rapid urbanization often bring along the loss of traditional ways of life and a strong pressure to speak a dominant language that is — or is perceived to be — necessary for full civic participation and economic advancement.”

First example: Haitian Creole

Guy Antoine, a Haitian-American born in Haiti and living in New York, founded the website Windows on Haiti in April 1998 to promote Haitian Creole, a French-based creole language spoken by most Haitians, and spoken too in the Dominican Republic, Canada, the United States and other countries.

Guy Antoine wrote in June 2001 in an email interview: “Who are the Haitian people without “Kreyól” (Creole for the non-initiated), the language that has evolved and bound various African tribes transplanted in Haiti during the slavery period? It is the most palpable exponent of commonality that defines us as a people. However, it is primarily a spoken language, not a widely written one. I see the web changing this situation more so than any traditional means of language dissemination. (…) One soon realizes that the language of a people is an essential and inextricable part of its culture.”

What is Windows on Haiti’s goal? “The site aims to be a major source of information about Haitian culture, and a tool to counter the persistently negative images of Haiti from the traditional media. The scope of this effort extends beyond mere commentary to the diversity of arts and history, cuisine and music, literature and reminiscences of traditional Haitian life. In short, the site opens some new windows to the culture of Haiti. (…) I hope to associate myself with the right group of people to go beyond Haiti, and advance towards this ideal of one world, one love. (…) The web is an interconnected network of servers and personal computers, at the keyboard of which you will find a person, an individual.”

What are Windows on Haiti’s languages? “The primary language of the site is English, but one will equally find a center of lively discussion conducted in ‘Kreyól’ [Haitian Creole]. In addition, one will find documents related to Haiti in French, in the old colonial Creole, and I am open to publishing others in Spanish and other languages. I do not offer any sort of translation, but multilingualism is alive and well at the site, and I predict that this will increasingly become the norm throughout the web. (…) Kreyól is the only national language of Haiti, and one of its two official languages, the other being French. It is hardly a minority language in the Caribbean context, since it is spoken by eight to ten million people. (…) I have created two discussion forums on my website, held exclusively in Kreyól. One is for general discussions on just about everything but obviously more focused on Haiti’s current socio-political problems. The other is reserved only to debates of writing standards for Kreyol. Those debates have been quite spirited and have met with the participation of a number of linguistic experts. The uniqueness of these forums is their non-academic nature.”

Second example: Scottish Gaelic

Scottish Gaelic is a Celtic language traditionally spoken in Scotland, a country that is part of the United Kingdom and covers the northern third of Great Britain. 59,000 persons spoke Gaelic in 2011, just over 1 percent of the population, according to the 2011 census. These figures were much lower than the figures of the 1901 census, with 200,000 Gaelic speakers, or 4.5 percent of the population.

The UNESCO Atlas of the World’s Languages in Danger now classifies Gaelic as a “definitely endangered” language, but this has not always been the case. For many centuries, everyone spoke Gaelic in Scotland and Ireland, and academics disseminated their writings in Gaelic throughout Europe. Over the centuries, English gradually became the dominant language, including on the Scottish Western Isles, despite the presence of Scottish Gaelic as the first community language. The revival of Gaelic culture dates back to the early 19th century, in the form of poetry, prose and music. Between the two world wars, a radio channel began broadcasting the news in Gaelic, and Gaelic started being learned again in schools. Today, more novels are published in Gaelic that at any other time. Radio nan Gàidheal has broadcasted in Gaelic since the 1980s, and the TV channel ALBA since the early 2000s. Both have a web presence, which has boosted their audience.

What about Scottish Gaelic on the web? Wikipedia has its Gaelic version, named Uicipeid. Opera was the first browser offering an interface in Gaelic, in 2001, followed by Firefox and Google Chrome, thanks to Michael Bauer. A freelance translator from English to Scottish Gaelic, Michael Bauer has worked on several localization projects on his free time, “just for the love of it, with a fellow localizer who on the web only goes by GunChleoc [“a woman” in Scottish Gaelic]”, a proof that few people can do a lot for their language community. The localization projects included the Gaelic versions of Thunderbird (Mozilla Messaging), Lightning (Mozilla Calendar), OpenOffice, LibreOffice, the VLC media player, the game Freeciv (the open source version of the game Civilization), and (a software that automatically inserted accents). The Firefox application to change the interface from English to Gaelic has been used by 500 users daily since 2011. Michael Bauer also created a spell checker named An Dearbhair Beag, with Kevin Scannell. Since 2012, he has worked on a few paid projects with his fellow localizer GunChleoc, for example Microsoft’s Gaelic language pack, first Windows, then Office and other smaller projects.

What about dictionaries? There are three major online dictionaries in Scottish Gaelic. The first dictionary is Stòr-dàta, an online dictionary that is mostly the database of a word list managed by the college Sabhal Mòr Ostaig, a college on the Isle of Skye, Scotland, where all the courses are taught in Scottish Gaelic. The second dictionary is the Dwelly, a Gaelic dictionary published in 1911, which is to Gaelic what the Oxford English Dictionary is to English. Its digital version is the result of a ten-year work by Michael Bauer and his colleague Will Robertson. The third dictionary is Am Faclair Beag, which means “small dictionary” but is actually a large dictionary offering both the Dwelly and more modern data, also maintained by Michael Bauer and Will Robertson.

Do free and open source software help in the spread of Gaelic? Michael Bauer wrote in October 2015 in an email interview: “There are, sadly, far too few users and there are some aspects which actually actively limit usage. For example, Gaelic schools cannot install Gaelic software because the IT contracts are given by the councils to outside IT companies who only provide English software and operating systems. Because they limit the admin rights of the users at schools, this means it is very difficult to install software which is not on their official ‘list’ and because Gaelic is not mentioned in the contract, they don’t put it there. I would say it’s fair that free and open software has *helped* carve out more of a space on the web for Gaelic and that cooperating with commercial long-term partners is helping to produce some very useful enabling technologies such as the predictive texting tool Adaptxt or the upcoming text-to-speech tool with Cereproc.”

According to Michael Bauer, a central storage space for translations would be useful for localization projects, with a shared translation memory, thus avoiding to endlessly retranslate the same terms, phrases and sentence segments. If the software translations could be available from the same site, like a meta-Pootle (a community localization server), everyone working for the revival of a minority language on the web would benefit from it. “There actually was/is something a bit like that, Ubuntu’s Launchpad but unfortunately there is not enough coordination between Launchpad and the projects and much effort is going to waste by people working on Launchpad and the translations not going anywhere. There is also AmaGama these days which is something like that but not commonly used apart from some like Mozilla and LibreOffice (I think). Part of the problem is there are so many platforms these days, all trying to carve out a niche… some of them commercial, like Transifex or Crowdin.”

Third example: Indigenous Tweets

The website Indigenous Tweets was launched in March 2011 by Kevin Scannell on his free time to identify tweets in minority languages, and to strengthen the online presence of indigenous linguistic communities. The website started with 35 languages, with 71 languages three weeks later, 144 languages in March 2013 and 184 languages in October 2017.

To find these Twitter threads, Kevin Scannell, who is a computer scientist and a professor at Saint Louis University, Missouri, designed a statistical software crawling the web, named An Crúbadán. Indigenous Tweets’ home page lists all minority languages identified as being active on Twitter. People seek the language they want, click on the corresponding row, and are redirected to a new page that lists users in that language (with a maximum of 500 users) and statistics for each user: number of tweets, number of followers, percentage of tweets in the given language (some users tweet in both a global language and a minority language), and date of the last tweet. People can also get in touch directly through Twitter; a number of joint projects in minority languages have started this way.

According to Kevin Scannell, people tweeted in 500 global or minority languages in March 2011. Around 1,000 languages had a significant presence on the web through primary texts (newspapers, blog posts, Wikipedia articles, Bible translations). The main minority languages were Haitian Creole (with users from the Caribbean, North America and other places), Basque and Welsh.

As explained by Kevin Scannell on his blog in March 2011: “The primary aim of Indigenous Tweets is to help build online language communities through Twitter. We hope that the site makes it easier for speakers of indigenous and minority languages to find each other in the vast sea of English, French, Spanish, and other global languages that dominate Twitter. (…) Even speakers of languages like Basque and Welsh with vibrant online communities have been surprised to find just how many people there are tweeting in their language. This is the other goal of Indigenous Tweets: it’s a message to the world that says ‘We are here and we’re proud of our languages’. For languages with just a few users, I hope it inspires some people to start — make your voice heard!”

The internet is a good tool for language revitalization: “Speakers of indigenous and minority languages around the world are struggling to keep their languages and cultures alive. More and more language groups are turning to the web as a tool for language revitalization, and as a result there are now thousands of people blogging and using social media sites like Facebook and Twitter in their native language. These sites have allowed sometimes-scattered communities to connect and use their languages online in a natural way. Social media have also been important in engaging young people, who are the most important demographic in language revitalization efforts. Together we’re breaking down the idea that only global languages like English and French have a place online!”

In September 2011, Kevin Scannell also launched Indigenous Blogs to identify blogs written in 50 minority languages and create the same kind of platform. He began with the blogs hosted by Blogspot (which also hosts his own blog), and went on with the blogs on WordPress and Tumblr. Indigenous Blogs included 74 languages in March 2013, and 85 languages in October 2017.

There is still a lot to do

Robert Beard, co-founder of, one of the first websites to create a section for endangered languages, wrote in January 2000: “Languages that are endangered are primarily languages without writing systems at all (only 1/3 of the world’s 6,000+ languages have writing systems). I still do not see the web contributing to the loss of language identity and still suspect it may, in the long run, contribute to strengthening it. More and more Native Americans, for example, are contacting linguists, asking them to write grammars of their language and help them put up dictionaries. For these people, the web is an affordable boon for cultural expression.”

There is still a lot to do to help language revitalization efforts. Some minority languages still need basic language technologies like keyboard settings or spell checkers. Other minority languages have the basic language technologies, but they still need language dictionaries, grammars and glossaries.

On International Dictionary Day on 16 October 2015, 15 new and revised South-African indigenous language dictionaries started being published online, with 13 dictionaries available on 30 November 2015, and two dictionaries available in early 2016.

Instead of compiling sometimes unreliable statistical data, who not directly ask these minority and indigenous language communities about their needs? Why not rely on what people have to say and give them a voice, instead of relying on numbers? Their language is the gate to their culture. Using their language instead of a global language, or as a complement to a global language, contributes to the revival of their culture and — as important — to their self-esteem.

Copyright © 2017-18 Marie Lebert
License CC BY-NC-SA version 4.0

Written by marielebert

2017-10-24 at 18:01

Posted in Uncategorized