Small language, big data: Building the Gurindji Kriol corpus to model the emergence of a mixed language
Loading...
Date
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Interviewee
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
University of Hawaii Press
Volume
19
Number/Issue
Starting Page
348
Ending Page
367
Alternative Title
Abstract
At 178 hours and 853,348 words, the Gurindji Kriol corpus (Meakins & Algy 2004) is currently the largest annotated corpus of an Australian Indigenous language, and is a significant record of the community’s language use in a complex multilingual environment. Together with the Gurindji corpus, four generations of language use and change in the Gurindji community are represented, including the rare emergence of a mixed language. In this paper, we present details on the development of this corpus, in particular the complex processes of corralling this data into a consistent format that enables quantitative and computational work. The scale, breadth and consistency of the corpus has enabled innovative research into questions of language variation, contact, emergence and change; and has helped the Gurindji community to better understand linguistic changes and continuities across generations. Data-cleaning and annotation are often overlooked in discussions of data management within the field of language documentation. However, they are important steps in any quantitative research, and the amount of work required can be significantly reduced with thoughtful automation. Our approach, drawn from industry best practice, may provide a useful model for others working on the development of corpora of low-resource languages.
Description
Keywords
Citation
Wilmoth, Sasha, Felicity Meakins, Cassandra Algy. 2025. Small language, big data: Building the Gurindji Kriol corpus to model the emergence of a mixed language. Language Documentation & Conservation 19: 348-367.
DOI
Extent
20
Format
Article
Geographic Location
Time Period
Related To
Related To (URI)
Table of Contents
Rights
Creative Commons Attribution-NonCommercial 4.0 International
Rights Holder
Catalog Record
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.
