Please use this identifier to cite or link to this item:

Towards Language Documentation 2.0: Imagining a crowdsourcing revolution

File SizeFormat 
25302.mp349.92 MBMP3View/Open
25302.pdf2.07 MBAdobe PDFView/Open

Item Summary

Title: Towards Language Documentation 2.0: Imagining a crowdsourcing revolution
Issue Date: 12 Mar 2015
Description: Language documentation theory has provided critical insights into the nature of a lasting, multipurpose record of a language (Himmelmann 1998, 2012). Much of the literature has focused on the desirable properties of a comprehensive ‘best record’ of language (Woodbury, 2003). Language documentation leans heavily upon traditional linguistic fieldwork methods such as elicitation and detailed transcription performed in the field. These activities are dependent on highly trained linguists as facilitators for every documentary event. The resulting lack of ‘scalability’ in these methods threatens our ability to meet even modest documentary goals (Liberman, 2006).

Responding to productivity concerns, Reiman (2010) introduced Basic Oral Language Documentation. The BOLD method utilises phrase-aligned ‘oral transcriptions’ with the aim of deferring transcription until after fieldwork. BOLD may be enacted by participants with limited training thereby side stepping a major impediment to scaling up documentary activity. The Aikuma smartphone application implements an interactive variant of the BOLD method (Bird & Hanke, 2013; Bird et al., 2014). While still under development, field trials with Aikuma have shown that participants have been able to autonomously collect spoken narratives with respeaking and translation. The assumption to date is that these tools would be deployed by a field linguist to complement an evolving linguistic description.

Yet the ever expanding footprint of the World Wide Web means that we would be foolish to believe that field linguists will be the sole facilitators of documentary activities. As the web reaches new frontiers today, it is the ‘Web 2.0’ replete with social networks that communities will first encounter. Where there’s the will to maintain their language, communities are increasingly finding that they have the tools to do so, such the Mapuche people of Chile and their use of Facebook, YouTube and Twitter to promote and preserve their linguistic heritage (Campbell & Huck, 2013). This raises the question as to whether purpose built linguistic crowdsourcing tools can and should interact with the Web 2.0 ecosystem. We also note that the social web has evolved solutions for other documentary challenges. Nathan (2006) described how 2.0 ‘sharing’ features in the SOAS ELAR archive facilitate stakeholder negotiations to manage the complexities of access and distribution in language documentation.

Using examples from recent field trials and urban fieldwork, we demonstrate the potential of participant-driven documentation to produce a scalable corpora of natural language and discuss the trade-offs between fidelity vs. quantity. However the process of developing and using these tools has also heightened our awareness of the potential ramifications of social computing and language documentation. We conclude with a thought experiment based on a proposed ‘linguistic social network’ and the linguist of the future.


Boerger, Brenda. 2011. To boldly go where no one has gone before. Language Documentation & Conservation, 5, pp. 208–233.

Campbell, Baird & James Huck. 2013. Social Media as a Tool for Linguistic Maintenance and Preservation among the Mapuche. Proceedings of the 2013 LAGO Graduate Student Conference “Decolonizing the Americas”, Tulane University.

Himmelmann, Nikolaus P. 1998. Documentary and Descriptive Linguistics. Linguistics, 36, 161–195.

Himmelmann, Nikolaus P. 2012. Linguistic Data Types and the Interface between Language Documentation and Description. Language Documentation & Conservation, 6. 187–207.

Liberman, Mark. 2006. The problems of scale in language documentation. Computational Linguistics for Less-Studied Languages workshop, Texas Linguistics Society.

Reiman, D. Will. 2010. Basic oral language documentation. Language Documentation & Conservation 4. 254–268.

Woodbury, Anthony C. 2003. Defining language documentation. In Peter K. Austin (Ed.), Language Documentation and Description (Vol. 1, pp. 35–51). London: SOAS.
Rights: Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
Appears in Collections:4th International Conference on Language Documentation and Conservation (ICLDC)

Items in ScholarSpace are protected by copyright, with all rights reserved, unless otherwise indicated.