Please use this identifier to cite or link to this item:

Why documenting different languages necessitates different data

File SizeFormat 
26138.mp337.46 MBMP3View/Open

Item Summary

Title: Why documenting different languages necessitates different data
Authors: McDonnell, Bradley
Issue Date: 28 Feb 2013
Description: Since its inception as a sub-discipline of linguistics, most theoretical issues in Documentary Linguistics have revolved around data collection. A particularly lively discussion has focused on the balance between constructing a corpus of naturally occurring discourse and collecting isolated examples through direct elicitation (cf. Evans 2008, Himmelmann 2012). While some advocate for a strong focus on constructing corpora of naturally occurring discourse (i.e., Himmelmann 2006), others have raised concerns that over-emphasizing the role of discourse data will result in aspects of the language being neglected (i.e., Rhodes et. al. 2006, Chelliah & de Ruese 2011). Using languages of North America, Mithun (2001) demonstrates that an optimal balance between discourse data and elicited data will differ depending on the linguistic domain in question (i.e., lexicon, phonology, grammar). The question remains, however, to what extent is this balance similar for languages with different typological profiles (i.e., isolating, agglutinating, or polysynthetic languages) and in various sociolinguistic situations (i.e., monolingual, bilingual, or multilingual speech communities)? Drawing on the documentation of Besemah, a Malay language of southwest Sumatra, this paper presents case studies of (1) the periphrastic passive, (2) the headless relative clause construction, and (3) the syncretic causative/applicative construction, in order to show that expanding the role of naturalistic discourse data increases both the depth and insight of the grammatical analyses of the language.

Like most Malay languages of Sumatra, Besemah is considered an ‘underspecified’ language (cf. Gil 2001), as it lacks grammatical marking of nominal categories like person, number, and case as well as verbal categories like tense and agreement. Additionally, Besemah represents a complex sociolinguistic setting, being polyglossic with two other Malay varieties, Standard Indonesian (the language of government, education, and the media) and Palembang Indonesian (the lingua franca of southern Sumatra). The relative similarity of these Malay varieties coupled with the sociolinguistic complexity with which they are used make it difficult to obtain reliable elicited grammatical examples and/or grammaticality judgments. At the same time, the nature of the ‘underspecified’ grammar requires much less direct elicitation of paradigms, allomorphy, etc. Based on these factors, the aim of this paper is to show that striking the balance between discourse data and elicited data is not one-size-fits-all, but is dependent on a number of factors. An important outcome of this study, therefore, is to illustrate that language documentation projects need to calibrate data collection methodologies to accommodate the individual differences that each language possesses.
Rights: Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
Appears in Collections:3rd International Conference on Language Documentation and Conservation (ICLDC)

Please contact if you need this content in an alternative format.

Items in ScholarSpace are protected by copyright, with all rights reserved, unless otherwise indicated.