Why documenting different languages necessitates different data

Date
2013-02-28
Authors
McDonnell, Bradley
Contributor
Advisor
Department
Instructor
Depositor
Speaker
McDonnell, Bradley
Researcher
Consultant
Interviewer
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
Description
Since its inception as a sub-discipline of linguistics, most theoretical issues in Documentary Linguistics have revolved around data collection. A particularly lively discussion has focused on the balance between constructing a corpus of naturally occurring discourse and collecting isolated examples through direct elicitation (cf. Evans 2008, Himmelmann 2012). While some advocate for a strong focus on constructing corpora of naturally occurring discourse (i.e., Himmelmann 2006), others have raised concerns that over-emphasizing the role of discourse data will result in aspects of the language being neglected (i.e., Rhodes et. al. 2006, Chelliah & de Ruese 2011). Using languages of North America, Mithun (2001) demonstrates that an optimal balance between discourse data and elicited data will differ depending on the linguistic domain in question (i.e., lexicon, phonology, grammar). The question remains, however, to what extent is this balance similar for languages with different typological profiles (i.e., isolating, agglutinating, or polysynthetic languages) and in various sociolinguistic situations (i.e., monolingual, bilingual, or multilingual speech communities)? Drawing on the documentation of Besemah, a Malay language of southwest Sumatra, this paper presents case studies of (1) the periphrastic passive, (2) the headless relative clause construction, and (3) the syncretic causative/applicative construction, in order to show that expanding the role of naturalistic discourse data increases both the depth and insight of the grammatical analyses of the language. Like most Malay languages of Sumatra, Besemah is considered an ‘underspecified’ language (cf. Gil 2001), as it lacks grammatical marking of nominal categories like person, number, and case as well as verbal categories like tense and agreement. Additionally, Besemah represents a complex sociolinguistic setting, being polyglossic with two other Malay varieties, Standard Indonesian (the language of government, education, and the media) and Palembang Indonesian (the lingua franca of southern Sumatra). The relative similarity of these Malay varieties coupled with the sociolinguistic complexity with which they are used make it difficult to obtain reliable elicited grammatical examples and/or grammaticality judgments. At the same time, the nature of the ‘underspecified’ grammar requires much less direct elicitation of paradigms, allomorphy, etc. Based on these factors, the aim of this paper is to show that striking the balance between discourse data and elicited data is not one-size-fits-all, but is dependent on a number of factors. An important outcome of this study, therefore, is to illustrate that language documentation projects need to calibrate data collection methodologies to accommodate the individual differences that each language possesses.
Keywords
Citation
Extent
Format
Geographic Location
Time Period
Related To
Table of Contents
Rights
Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.