Collaborative corpus building for minorized languages using wiki-technology. Documenting the Asturian language

dc.contributor.author Larusson, Johann
dc.contributor.author Saurí, Roser
dc.contributor.author Viejo, Xulio
dc.date.accessioned 2009-04-08T01:35:16Z
dc.date.available 2009-04-08T01:35:16Z
dc.date.issued 2009-04-08T01:35:16Z
dc.description.abstract Eslema is the first project devoted to building a corpus for Asturian. Asturian (or Asturian-Leonese) is the Romance language autochthonous of most of the territory in Asturias, Leon and Zamora provinces (Spain), and the district of Miranda do Douro (Portugal). Its community of speakers is estimated to be around 300,000 people, corresponding to approximately a third of the population of the area where Asturian is spoken. These figures bode ill for the future of the language since Asturian competence is notably reduced among young people, a fact that seriously threatens its generational transmission (Llera Ramo, 2002). Being the corpus of a minorized language, Eslema’s main goals are both (a) documenting Asturian in a systematic way, and (b) helping set the foundation for codifying and fully normalizing it as the language of use in any possible social context. As such, the project is conceived as a general framework for developing several subcorpora, including documents of a varied typology and from different historical periods, representing both written and oral discourse (Author, 2008a). Eslema’s scarcity of funding has prompted an alternative search for much needed resources. As with many Western minorized languages Asturian speakers feel a degree of commitment to the language and its survival. Using this to our advantege, we have developed a wiki-based environment that enables the entire Asturian community to collaboratively collect and annotate texts online, enlarging Eslema at a minimum cost. Wikis are ideally suited for this kind of activity. A wiki is essentially a website enabling non-collocated users to easily asynchronously co-edit and share documents. Wikis are very loosely structured and do not favor a particular type of content or a “tech-savvy” method of manipulating the content. Previous research has developed a platform called the WikiDesignPlatform (WDP) to support different kinds of wiki-based collaborative learning activities (Author, 2008b). The WDP provides a suite of awareness, navigational, and communicative components that can be easily layered on top of, or coupled with, standard wiki features. Using the WDP platform, we are able to quickly engineer an online workspace tailored to the needs of community. Users can easily suggest documents for classification, collectively classify texts, and communicate their work. Using the WDP’s awareness features, users can keep current on the progress of their work and the advancement of individual documents. This paper, presents the collaborative WDP-based environment we have built, its application and results in compiling the Asturian corpus. References: Author (2008a) Eslema. Towards a Corpus for Asturian. In Collaboration: interoperability between people in the creation of language resources for less-resourced languages. A SALTMIL workshop. LREC 2008. Marrakesh. Author (2008b). Supporting and Tracking Collective Cognition in Wikis. In Proceedings of ICLS 2008: International Conference for the Learning Sciences: Vol. 3 (pp. 330-337). The International Society of the Learning Sciences. Llera Ramo, F. (2002). II Estudiu siciollingüísticu d’Asturies. Avance de datos. In Lletres Asturianes, 89, 181–197.
dc.identifier.uri http://hdl.handle.net/10125/4984
dc.language.iso en
dc.title Collaborative corpus building for minorized languages using wiki-technology. Documenting the Asturian language
dc.type Conference Paper
dc.type.dcmi Text
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
4984.pdf
Size:
850.52 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: