|
Title:
|
Language-specific encoding in endangered language corpora
|
|
Author:
|
Gippert, Jost
|
|
Date:
|
2012-08 |
|
Publisher:
|
University of Hawai'i Press |
|
Citation:
|
Gippert, Jost. 2012. Language-specific encoding in endangered language corpora. In Frank Seifart, Geoffrey Haig, Nikolaus P. Himmelmann, Dagmar Jung, Anna Margetts, and Paul Trilsbeek (eds). 2012. Potentials of Language Documentation: Methods, Analyses, and Utilization. 25-31. Honolulu: University of Hawai'i Press. |
|
Abstract:
|
The paper addresses problems of corpus building and retrieval resulting from codeswitching, which is a characteristic feature of endangered language recordings. The typical appearance of code-switching phenomena is first outlined on the basis of data collected in the DoBeS ‘ECLinG’ project, which dealt with three endangered Caucasian languages spoken in Georgia: Tsova-Tush (Batsbi), Udi, and Svan. The problem of language-specific retrieval is illustrated with examples showing the usage of the word da in Tsova-Tush contexts, which represents, as a homonym, either a native copula form (‘it is’) or the Georgian conjunction ‘and’. The subsequent section discusses the annotation requirements that are necessary to automatically distinguish the languages involved in code-switching, with a focus on the emerging ISO standard 639-6. It is argued that the fine-grained distinction of varieties and subvarieties and their interrelationship – as aimed at in this standard – requires a thorough reconsideration if it is to be applied in the markup of corpus data. |
|
Series/Report No.:
|
LD&C Special Publication |
|
Sponsorship:
|
National Foreign Language Resource Center |
|
ISBN:
|
978-0-9856211-0-0 |
|
URI:
|
http://hdl.handle.net/10125/4512
|
|
Rights:
|
Creative Commons Attribution Non-Commercial Share Alike License |
Show full item record