Please use this identifier to cite or link to this item: http://hdl.handle.net/10125/26113

Technology in documentation: TEI and the Nxa'amxcín Dictionary

File SizeFormat 
26113.mp321.59 MBMP3View/Open
26113.pdf2.87 MBAdobe PDFView/Open

Item Summary

Title: Technology in documentation: TEI and the Nxa'amxcín Dictionary
Issue Date: 02 Mar 2013
Description: Expanding use of technology in endangered language documentation has increased interest in the development of digital standards for lexical information. Many digital lexica developed by linguists make use of standards like LIFT/GOLD (e.g., SIL's Toolbox, FLEX), or LMF/DCR (e.g., LEXUS -VICOS; Aristar-Dry et al 2012), but few are reported to use TEI, even though TEI is a Digital Humanities standard with a dictionary module (TEI, ch. 9; Romary and Wegstein 2012). In this paper, we outline a project for Nxa'amxcín (Salish) that uses TEI structure and markup. We argue that TEI is a useful tool for endangered language lexica.

The original Nxa'amxcín print-dictionary project, begun using Lexware (Hsu 1985), WordPerfect, and DOS, was exemplary in 1991, but dependence on customized character-sets, obsolete printer fonts, macros, and a Hercules graphics card, made the data unusable by 2005. A lengthy process retrieved and converted the data to a modern format (Author and Newton 2008). In the absence of a stable non-proprietary standard (ISO 24613 was released only in 2008), and following guidelines for interoperability, portability (Bird and Simon 2003) and use of open formats (see, e.g. Good 2011), TEI seemed an obvious choice in 2005: it is widely used for born-digital documents and provides a wide range of tags for dictionaries, linguistic analysis and corpus linguistics (chs. 15-18).

In our paper we show that, as an open, mature standard, TEI is a useful encoding strategy for our entire project, providing a reliable archival format for Nxa'amxcín data. Its infrastructure is more than a set of schemas and encoding guidelines (ch. 23), and it enables users to tightly constrain schemas to consist only of elements and attributes required by a specific project. It provides flexibility to encode morphological relationships, which is invaluable for the complex, Salish morphology of Nxa'amxcín. TEI also generates project-specific documentation embedded directly into a RelaxNG schema, providing inline help for XML encoders, incorporates peripheral data into the same digital corpus, and links across collections easily. The XML data serves as the basis for an online digital dictionary, for print dictionaries, wordlists, the dictionary website structure and supplementary material, and teaching and practice materials. Finally, editing with a well-documented TEI schema is relatively easy, and not dependent on an externally-controlled web application for data entry.

Because TEI is not widely used for endangered languages, we conclude by comparing TEI, LMF/DCR and LIFT/GOLD as they might apply in a Nxa'amxcín lexicon.
URI/DOI: http://hdl.handle.net/10125/26113
Rights: Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
Appears in Collections:3rd International Conference on Language Documentation and Conservation (ICLDC)



Items in ScholarSpace are protected by copyright, with all rights reserved, unless otherwise indicated.