Please use this identifier to cite or link to this item:
Language-specific encoding in endangered language corpora
|Title:||Language-specific encoding in endangered language corpora|
|Date Issued:||Aug 2012|
|Publisher:||University of Hawai'i Press|
|Citation:||Gippert, Jost. 2012. Language-specific encoding in endangered language corpora. In Frank Seifart, Geoffrey Haig, Nikolaus P. Himmelmann, Dagmar Jung, Anna Margetts, and Paul Trilsbeek (eds). 2012. Potentials of Language Documentation: Methods, Analyses, and Utilization. 25-31. Honolulu: University of Hawai'i Press.|
|Series:||LD&C Special Publication|
|Abstract:||The paper addresses problems of corpus building and retrieval resulting from codeswitching, which is a characteristic feature of endangered language recordings. The typical appearance of code-switching phenomena is first outlined on the basis of data collected in the DoBeS ‘ECLinG’ project, which dealt with three endangered Caucasian languages spoken in Georgia: Tsova-Tush (Batsbi), Udi, and Svan. The problem of language-specific retrieval is illustrated with examples showing the usage of the word da in Tsova-Tush contexts, which represents, as a homonym, either a native copula form (‘it is’) or the Georgian conjunction ‘and’. The subsequent section discusses the annotation requirements that are necessary to automatically distinguish the languages involved in code-switching, with a focus on the emerging ISO standard 639-6. It is argued that the fine-grained distinction of varieties and subvarieties and their interrelationship – as aimed at in this standard – requires a thorough reconsideration if it is to be applied in the markup of corpus data.|
|Rights:||Creative Commons Attribution Non-Commercial Share Alike License|
|Appears in Collections:||
LD&C Special Publication No. 3: Potentials of Language Documentation: Methods, Analyses, and Utilization|
Please email email@example.com if you need this content in ADA-compliant format.
Items in ScholarSpace are protected by copyright, with all rights reserved, unless otherwise indicated.