Item Description

Show full item record

Title: Language-specific encoding in endangered language corpora 
Author: Gippert, Jost
Date: 2012-08
Publisher: University of Hawai'i Press
Citation: Gippert, Jost. 2012. Language-specific encoding in endangered language corpora. In Frank Seifart, Geoffrey Haig, Nikolaus P. Himmelmann, Dagmar Jung, Anna Margetts, and Paul Trilsbeek (eds). 2012. Potentials of Language Documentation: Methods, Analyses, and Utilization. 25-31. Honolulu: University of Hawai'i Press.
Abstract: The paper addresses problems of corpus building and retrieval resulting from codeswitching, which is a characteristic feature of endangered language recordings. The typical appearance of code-switching phenomena is first outlined on the basis of data collected in the DoBeS ‘ECLinG’ project, which dealt with three endangered Caucasian languages spoken in Georgia: Tsova-Tush (Batsbi), Udi, and Svan. The problem of language-specific retrieval is illustrated with examples showing the usage of the word da in Tsova-Tush contexts, which represents, as a homonym, either a native copula form (‘it is’) or the Georgian conjunction ‘and’. The subsequent section discusses the annotation requirements that are necessary to automatically distinguish the languages involved in code-switching, with a focus on the emerging ISO standard 639-6. It is argued that the fine-grained distinction of varieties and subvarieties and their interrelationship – as aimed at in this standard – requires a thorough reconsideration if it is to be applied in the markup of corpus data.
Series/Report No.: LD&C Special Publication
Sponsorship: National Foreign Language Resource Center
ISBN: 978-0-9856211-0-0
URI: http://hdl.handle.net/10125/4512
Rights: Creative Commons Attribution Non-Commercial Share Alike License

Item File(s)

Files Size Format View
03gippert.pdf 675.5Kb PDF View/Open

This item appears in the following Collection(s)

Search


Advanced Search

Browse

My Account

Statistics

About