Towards Language Documentation 2.0: Imagining a crowdsourcing revolution

Title: Towards Language Documentation 2.0: Imagining a crowdsourcing revolution
Issue Date: 12 Mar 2015
Description: Language documentation theory has provided critical insights into the nature of a lasting, multipurpose record of a language (Himmelmann 1998, 2012). Much of the literature has focused on the desirable properties of a comprehensive ‘best record’ of language (Woodbury, 2003). Language documentation leans heavily upon traditional linguistic fieldwork methods such as elicitation and detailed transcription performed in the field. These activities are dependent on highly trained linguists as facilitators for every documentary event. The resulting lack of ‘scalability’ in these methods threatens our ability to meet even modest documentary goals (Liberman, 2006).

Responding to productivity concerns, Reiman (2010) introduced Basic Oral Language Documentation. The BOLD method utilises phrase-aligned ‘oral transcriptions’ with the aim of deferring transcription until after fieldwork. BOLD may be enacted by participants with limited training thereby side stepping a major impediment to scaling up documentary activity. The Aikuma smartphone application implements an interactive variant of the BOLD method (Bird & Hanke, 2013; Bird et al., 2014). While still under development, field trials with Aikuma have shown that participants have been able to autonomously collect spoken narratives with respeaking and translation. The assumption to date is that these tools would be deployed by a field linguist to complement an evolving linguistic description.

Yet the ever expanding footprint of the World Wide Web means that we would be foolish to believe that field linguists will be the sole facilitators of documentary activities. As the web reaches new frontiers today, it is the ‘Web 2.0’ replete with social networks that communities will first encounter. Where there’s the will to maintain their language, communities are increasingly finding that they have the tools to do so, such the Mapuche people of Chile and their use of Facebook, YouTube and Twitter to promote and preserve their linguistic heritage (Campbell & Huck, 2013). This raises the question as to whether purpose built linguistic crowdsourcing tools can and should interact with the Web 2.0 ecosystem. We also note that the social web has evolved solutions for other documentary challenges. Nathan (2006) described how 2.0 ‘sharing’ features in the SOAS ELAR archive facilitate stakeholder negotiations to manage the complexities of access and distribution in language documentation.

Using examples from recent field trials and urban fieldwork, we demonstrate the potential of participant-driven documentation to produce a scalable corpora of natural language and discuss the trade-offs between fidelity vs. quantity. However the process of developing and using these tools has also heightened our awareness of the potential ramifications of social computing and language documentation. We conclude with a thought experiment based on a proposed ‘linguistic social network’ and the linguist of the future.


