Please use this identifier to cite or link to this item: http://hdl.handle.net/10125/42015

A tool for sharing interlinearized and lexical data in diverse formats

File SizeFormat 
42015.pdf12.48 MBAdobe PDFView/Open
42015.mp327.22 MBMP3View/Open

Item Summary

Title: A tool for sharing interlinearized and lexical data in diverse formats
Authors: Kaufman, Daniel
Finkel, Raphael
Issue Date: 02 Mar 2017
Description: The last decade has seen great advances in the development of electronic tools for automated interlinearization, corpus creation and lexicon building (e.g. Fieldworks Explorer [FLEx]), as well as tools for creating time-aligned annotations (e.g. ELAN). However, methods for sharing these new data formats online lag far behind. While good options exist for lexical data (e.g. Webonary, Lexique Pro), there is no tool for turning a project created in the FLEx software into an online interlinearized corpus. We present here a tool in development which does precisely that. FLEx databases can be searched using regular expressions and individual lines from a text can be linked to audio and video media. The tool can furthermore bring together linguistic data in diverse formats (from ELAN, Praat, Fieldworks, Toolbox, Shoebox) for a single query and allow for queries over multiple language projects. We discuss the benefits of this program in relation to several ongoing fieldwork projects that are being used to evaluate it. These projects present several interesting challenges. In one, we attempt to create a unified database from several centuries of documentation during which the language showed considerable change. Similarly, in the second project we create a unified database for two lexically, syntactically and phonologically distinct dialects of the same language and show how an interlinearized database facilitates searching across dialects. Finally, in the third project, we show how video data can be integrated into an online FLEx database, a feature which is still lacking in the FLEx software itself. By way of conclusion, we show the audience how to upload their own data (either privately or publicly) and experiment with the tool’s features. Ultimately, the open source program will be available for anyone interested in hosting their own installations.
URI/DOI: http://hdl.handle.net/10125/42015
Appears in Collections:5th International Conference on Language Documentation and Conservation (ICLDC)



Items in ScholarSpace are protected by copyright, with all rights reserved, unless otherwise indicated.