Please use this identifier to cite or link to this item: http://hdl.handle.net/10125/4512

Files

File Description SizeFormat 
03gippert.pdf675.59 kBAdobe PDFView/Open

Item Summary

Title: Language-specific encoding in endangered language corpora
Authors: Gippert, Jost
Issue Date: Aug-2012
Publisher: University of Hawai'i Press
Citation: Gippert, Jost. 2012. Language-specific encoding in endangered language corpora. In Frank Seifart, Geoffrey Haig, Nikolaus P. Himmelmann, Dagmar Jung, Anna Margetts, and Paul Trilsbeek (eds). 2012. Potentials of Language Documentation: Methods, Analyses, and Utilization. 25-31. Honolulu: University of Hawai'i Press.
Series/Report no.: LD&C Special Publication
Abstract: The paper addresses problems of corpus building and retrieval resulting from codeswitching, which is a characteristic feature of endangered language recordings. The typical appearance of code-switching phenomena is first outlined on the basis of data collected in the DoBeS ‘ECLinG’ project, which dealt with three endangered Caucasian languages spoken in Georgia: Tsova-Tush (Batsbi), Udi, and Svan. The problem of language-specific retrieval is illustrated with examples showing the usage of the word da in Tsova-Tush contexts, which represents, as a homonym, either a native copula form (‘it is’) or the Georgian conjunction ‘and’. The subsequent section discusses the annotation requirements that are necessary to automatically distinguish the languages involved in code-switching, with a focus on the emerging ISO standard 639-6. It is argued that the fine-grained distinction of varieties and subvarieties and their interrelationship – as aimed at in this standard – requires a thorough reconsideration if it is to be applied in the markup of corpus data.
Sponsor: National Foreign Language Resource Center
URI/DOI: http://hdl.handle.net/10125/4512
ISBN: 978-0-9856211-0-0
Rights: Creative Commons Attribution Non-Commercial Share Alike License
Appears in Collections:LD&C Special Publication No. 3: Potentials of Language Documentation: Methods, Analyses, and Utilization



Items in ScholarSpace are protected by copyright, with all rights reserved, unless otherwise indicated.