Collaborative corpus building for minorized languages using wiki-technology. Documenting the Asturian language

Larusson, Johann
Saurí, Roser
Viejo, Xulio
Journal Title
Journal ISSN
Volume Title
Starting Page
Ending Page
Alternative Title
Eslema is the first project devoted to building a corpus for Asturian. Asturian (or Asturian-Leonese) is the Romance language autochthonous of most of the territory in Asturias, Leon and Zamora provinces (Spain), and the district of Miranda do Douro (Portugal). Its community of speakers is estimated to be around 300,000 people, corresponding to approximately a third of the population of the area where Asturian is spoken. These figures bode ill for the future of the language since Asturian competence is notably reduced among young people, a fact that seriously threatens its generational transmission (Llera Ramo, 2002). Being the corpus of a minorized language, Eslema’s main goals are both (a) documenting Asturian in a systematic way, and (b) helping set the foundation for codifying and fully normalizing it as the language of use in any possible social context. As such, the project is conceived as a general framework for developing several subcorpora, including documents of a varied typology and from different historical periods, representing both written and oral discourse (Author, 2008a). Eslema’s scarcity of funding has prompted an alternative search for much needed resources. As with many Western minorized languages Asturian speakers feel a degree of commitment to the language and its survival. Using this to our advantege, we have developed a wiki-based environment that enables the entire Asturian community to collaboratively collect and annotate texts online, enlarging Eslema at a minimum cost. Wikis are ideally suited for this kind of activity. A wiki is essentially a website enabling non-collocated users to easily asynchronously co-edit and share documents. Wikis are very loosely structured and do not favor a particular type of content or a “tech-savvy” method of manipulating the content. Previous research has developed a platform called the WikiDesignPlatform (WDP) to support different kinds of wiki-based collaborative learning activities (Author, 2008b). The WDP provides a suite of awareness, navigational, and communicative components that can be easily layered on top of, or coupled with, standard wiki features. Using the WDP platform, we are able to quickly engineer an online workspace tailored to the needs of community. Users can easily suggest documents for classification, collectively classify texts, and communicate their work. Using the WDP’s awareness features, users can keep current on the progress of their work and the advancement of individual documents. This paper, presents the collaborative WDP-based environment we have built, its application and results in compiling the Asturian corpus. References: Author (2008a) Eslema. Towards a Corpus for Asturian. In Collaboration: interoperability between people in the creation of language resources for less-resourced languages. A SALTMIL workshop. LREC 2008. Marrakesh. Author (2008b). Supporting and Tracking Collective Cognition in Wikis. In Proceedings of ICLS 2008: International Conference for the Learning Sciences: Vol. 3 (pp. 330-337). The International Society of the Learning Sciences. Llera Ramo, F. (2002). II Estudiu siciollingüísticu d’Asturies. Avance de datos. In Lletres Asturianes, 89, 181–197.
Geographic Location
Time Period
Related To
Table of Contents
Rights Holder
Local Contexts
Email if you need this content in ADA-compliant format.