Item Description

Show full item record

Title: Collaborative corpus building for minorized languages using wiki-technology. Documenting the Asturian language 
Author: Larusson, Johann; Saurí, Roser; Viejo, Xulio
Date: 2009-04-08
Abstract: Eslema is the first project devoted to building a corpus for Asturian. Asturian (or Asturian-Leonese) is the Romance language autochthonous of most of the territory in Asturias, Leon and Zamora provinces (Spain), and the district of Miranda do Douro (Portugal). Its community of speakers is estimated to be around 300,000 people, corresponding to approximately a third of the population of the area where Asturian is spoken. These figures bode ill for the future of the language since Asturian competence is notably reduced among young people, a fact that seriously threatens its generational transmission (Llera Ramo, 2002). Being the corpus of a minorized language, Eslema’s main goals are both (a) documenting Asturian in a systematic way, and (b) helping set the foundation for codifying and fully normalizing it as the language of use in any possible social context. As such, the project is conceived as a general framework for developing several subcorpora, including documents of a varied typology and from different historical periods, representing both written and oral discourse (Author, 2008a). Eslema’s scarcity of funding has prompted an alternative search for much needed resources. As with many Western minorized languages Asturian speakers feel a degree of commitment to the language and its survival. Using this to our advantege, we have developed a wiki-based environment that enables the entire Asturian community to collaboratively collect and annotate texts online, enlarging Eslema at a minimum cost. Wikis are ideally suited for this kind of activity. A wiki is essentially a website enabling non-collocated users to easily asynchronously co-edit and share documents. Wikis are very loosely structured and do not favor a particular type of content or a “tech-savvy” method of manipulating the content. Previous research has developed a platform called the WikiDesignPlatform (WDP) to support different kinds of wiki-based collaborative learning activities (Author, 2008b). The WDP provides a suite of awareness, navigational, and communicative components that can be easily layered on top of, or coupled with, standard wiki features. Using the WDP platform, we are able to quickly engineer an online workspace tailored to the needs of community. Users can easily suggest documents for classification, collectively classify texts, and communicate their work. Using the WDP’s awareness features, users can keep current on the progress of their work and the advancement of individual documents. This paper, presents the collaborative WDP-based environment we have built, its application and results in compiling the Asturian corpus. References: Author (2008a) Eslema. Towards a Corpus for Asturian. In Collaboration: interoperability between people in the creation of language resources for less-resourced languages. A SALTMIL workshop. LREC 2008. Marrakesh. Author (2008b). Supporting and Tracking Collective Cognition in Wikis. In Proceedings of ICLS 2008: International Conference for the Learning Sciences: Vol. 3 (pp. 330-337). The International Society of the Learning Sciences. Llera Ramo, F. (2002). II Estudiu siciollingüísticu d’Asturies. Avance de datos. In Lletres Asturianes, 89, 181–197.
URI: http://hdl.handle.net/10125/4984

Item File(s)

Files Size Format View
4984.pdf 850.5Kb PDF View/Open

This item appears in the following Collection(s)

Search


Advanced Search

Browse

My Account

Statistics

About