Please use this identifier to cite or link to this item:

A quantitative analysis of linguistic metadata

File SizeFormat 
25299.mp359.48 MBMP3View/Open
25299.pdf12.29 MBAdobe PDFView/Open

Item Summary

Title: A quantitative analysis of linguistic metadata
Issue Date: 12 Mar 2015
Description: Documentation is both labor intensive and time consuming. The number of man-hours that trained linguists are capable of contributing to the task of documenting the world's languages is not sufficient to document every language to fully satisfactory levels. Therefore, we should use all the tools at our disposal in order to decide which languages to focus on. Obviously, community interest and relationships with community members and consultants will always play a major role in determining which languages are documented, as well as the personal interests of linguists themselves. Beyond this, what factors should we use in choosing languages to document, and which factors have we been using? Advice on this topic varies – some say endangerment is an important factor (Krauss, 1992) while others say it should be ignored completely (Newman, 2013). Our major funding bodies seek to support linguists documenting endangered languages. (Nathan, 2013). Meanwhile, our literature routinely assumes an inverse correlation between language endangerment and the extent of completed linguistic research (King, 2008), while rejecting an inverse correlation between language size and endangerment (Nettle, 2000). In both cases, if evidence is given, it is in the form of specific languages that amount to statistically insignificant data. In short, linguists speculate about statistical relations of linguistic metadata even when no quantitative analysis has been carried out. Furthermore, linguists can be skeptical of quantitative methods and those who use them, treating data mining and analytics as suspect methods used by outsiders that are not sensitive to the issues faced by linguists. In this context, we as a community need to explore quantitative analyses of our data and see where it leads us. In an attempt to begin this process, I ask whether correlations exist among the metadata we collect, including language size, endangerment and typological classification, as well as the extent of available documentation and pedagogical materials. With this work I hope to establish a discussion about what the presence and absence of such correlations means, what we should glean from this statistical analysis, and where we should go from there.

King, Kendall A. (2008). Sustaining linguistic diversity: endangered and minority languages and language varieties. Georgetown University Press.

Krauss, M. (1992). The world's languages in crisis. Language, 68(1), 4-10.

Nathan, David. (2013). The hans rausing endangered languages project.

Nettle, D. (2000). Vanishing Voices: The Extinction of the World's Languages: The Extinction of the World's Languages. Oxford University Press.

Newman, Paul. (2013). "The Law of Unintended Consequences: How the Endangered Languages Movement Undermines Field Linguistics as a Scientific Enterprise." Linguistics departmental seminar series. SOAS, London. October 15th.
Rights: Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
Appears in Collections:4th International Conference on Language Documentation and Conservation (ICLDC)

Items in ScholarSpace are protected by copyright, with all rights reserved, unless otherwise indicated.