Volume 19 : Language Documentation & Conservation
Permanent URI for this collectionhttps://hdl.handle.net/10125/81750
Browse
Recent Submissions
Item type: Item , An overview of Mapuzugun writing system proposals, current usage, and political implications(2025-12) Kelly BaurThe Mapuche language is considered “definitely endangered” by UNESCO (Moseley 2010). However, it is still spoken today in Chile and Argentina by the Mapuche people. In addition to the L1 speakers in these regions, young people are learning and teaching the language as an effort to revitalize the language. This paper provides an overview of eight proposals for a Mapuche writing system and how they attempt to find a balance between univocality, limiting confusion for L1 Spanish speakers, and respecting diverse language ideologies. Based on semi-structured interviews with ten Mapuzugun language revitalization activists living in the Bíobío and Araucanía regions of Chile, this paper provides an overview of Mapuzugun writing system proposals and language policies, highlighting the priorities, goals, and ideologies of language activists. Most participants did not believe that any writing system was inherently more favorable than the others. However, opinions ranged from encouraging one standard writing system to vehemently rejecting standardization. Despite these differences of opinion, they all justified their stances on the belief that their strategy (standardization or not) would do more to increase the number of Mapuzugun speakers and thus, be more effective in terms of achieving their shared vision of language revitalization and maintenance.Item type: Item , A survey of language use patterns among undergraduate students in Nagaland(University of Hawaii Press, 2025-11) Temsunungsang TMany languages around the world are at different levels of endangerment due to varied reasons. The situation in Nagaland is no different, as all languages spoken by the Nagas are categorised as Vulnerable. The present state is a result of the rapid changes in Naga society over the past 150 years, pushed by education, urbanisation, and modernity. Nagaland is a state in north east India, inhabited by about 20 tribes speaking different languages. Today, in addition to their respective mother tongues, English and Nagamese are widely spoken, with Hindi used occasionally in certain domains. Hence, Nagas have become a truly multilingual society. Given the complex situation, it is important to examine the use of language in various domains, which will help one determine the status of these languages. In this paper, I examine the language use pattern among 2,984 undergraduate students of Nagaland in the home domain (parents, grandparents, siblings, and relatives); informal domain (friends and markets); and the formal domain (institutions, offices, and churches); and their knowledge of the mother tongue. The survey shows that while the use of one’s mother tongue is strong within the home domain, it is strongest between respondents and (grand)parents, while the use of English and Nagamese is increasing between respondents and siblings at the cost of the mother tongue. This points to a generational difference in the use of the mother tongue. This is corroborated by the survey involving respondents and friends (within the same tribe) where use of mother tongue is strong but Nagamese is not far behind. An interesting result is the high usage of English and Nagamese in churches, a formal domain where the mother tongue has strong status. New domains such as social media have not helped the situation, which are heavily dominated by English. The generational difference and use of mother tongue in fewer domains is reflected in the respondent’s level of knowledge of folktales, folksongs, proverbs, and idioms, and specific terms related to flora and fauna, address terms, traditional ornaments, weaving/basketry, and agriculture, which is far from satisfactory. The paper advocates strengthening of mother tongue education by all stakeholders in a major way, failing which, our mother tongues will lose to English and Nagamese.Item type: Item , Small language, big data: Building the Gurindji Kriol corpus to model the emergence of a mixed language(University of Hawaii Press, 2025-11) Sasha Wilmoth; Felicity Meakins; Cassandra AlgyAt 178 hours and 853,348 words, the Gurindji Kriol corpus (Meakins & Algy 2004) is currently the largest annotated corpus of an Australian Indigenous language, and is a significant record of the community’s language use in a complex multilingual environment. Together with the Gurindji corpus, four generations of language use and change in the Gurindji community are represented, including the rare emergence of a mixed language. In this paper, we present details on the development of this corpus, in particular the complex processes of corralling this data into a consistent format that enables quantitative and computational work. The scale, breadth and consistency of the corpus has enabled innovative research into questions of language variation, contact, emergence and change; and has helped the Gurindji community to better understand linguistic changes and continuities across generations. Data-cleaning and annotation are often overlooked in discussions of data management within the field of language documentation. However, they are important steps in any quantitative research, and the amount of work required can be significantly reduced with thoughtful automation. Our approach, drawn from industry best practice, may provide a useful model for others working on the development of corpora of low-resource languages.Item type: Item , Translating Language Documentation Texts: A survey of macro-level translation methods used in the translation of endangered language texts in a digital language archive(University of Hawaii Press, 2025-11) Holly DraytonA central aim of Language Documentation is the creation and storage of lasting, multi-purpose collections of endangered-language texts, with accompanying annotation. Despite many important developments in the field over the last two decades, there has been no systematic attempt to research or theorise the translation processes or resulting translated texts involved in language documentation work. In this paper I argue that translation practices in Language Documentation can be linked to the field’s social justice goals, and decisions to translate (or not) determine whether a collection can be useful beyond the initial documentation project. I present a study of macro-level translation methods based on a corpus of 1088 texts from 10 deposits in the Endangered Languages Archive (ELAR). I survey the decisions made by researchers of which texts to translate, which genres to translate, which languages to translate into and choice of translator. This research demonstrates how a key development from the field of Translation Studies, the notion of Translator Agency, could be integrated with Language Documentation to facilitate the production of more multi-purpose, useful translated texts that better support the field’s social justice aims.Item type: Item , The Balinese Homesign Corpus: New insights into corpus development in a rural signing context(University of Hawaii Press, 2025-11) Satyawati; Ni Made Dadi Astini; Ni Made Sumarni; Ketut Kanta; Josefina Safar; Hannah Lutzenberger; Nick Palfreyman; Connie de VosThe Balinese Homesign Corpus is a collection of homesign varieties that emerged within the same gestural context as sign languages that are used in northern Bali, such as Kata Kolok. This paper provides a detailed account of the data collection process carried out by a group of local research assistants. An ethnographic overview of the social interaction among homesigners with Kata Kolok signers is also provided. We suggest that several factors, such as topography, gender-specific norms, and technology, may play a role in the social networks of the homesigners. Furthermore, we observe that homesigners in Bali are not living in complete isolation. While they might not have full access to all domains of social life, they are tightly integrated into religious duties and family routines. This paper highlights the importance of locally led data collection for improving the ecological validity and the quality of data. We suggest methods like mobile ethnographic filmmaking (Moriarty 2020) could provide valuable insights into the role of social interaction in the emergence of homesign in future work.Item type: Item , A phonetic description of Káínai Blackfoot(University of Hawaii Press, 2025-08) Natalie Weber; Donald DerrickThis paper presents the Blackfoot (Algonquian) phonetic system from data provided by Tootsinam (Beatrice Bullshields, 1945–2015), a native speaker of Káínai’powahsin, the Blackfoot dialect associated with the Blood Nation. There are relatively few phonetic studies of underdocumented languages, and Blackfoot is no exception. We fill this gap by providing a general articulatory description of the segmental, prosodic, and suprasegmental properties of the language, with an aim to provide a starting point for future targeted studies. Blackfoot is an interesting case study because many of the basic phonetic and phonological facts of the language are still highly contested, and because there are several typologically distinctive characteristics compared to well-documented languages, such as the unusual distribution of /s/. Within each section, we summarize all previous research on Blackfoot up to this point and explain which properties are well understood and which require further research. We also present some novel observations of Tootsinam’s speech that differ from existing documentation, including the distribution of short centralized vowels outside of closed syllables, and an allophonic falling tone on word-final stressed syllables.Item type: Item , Teaching food knowledge through a card game(University of Hawaii Press, 2025-07) Tereza Hlaváčková; Steven BirdMaintaining and revitalising languages calls for extensive supporting materials, materials which are relevant to the domains of knowledge that people wish to master, and which are culturally safe relative to local pedagogies. For sustainability, such materials should also be easy to create, and not dependent on significant outside expertise and technology. Over a period of two years, the authors worked with local people on the ground to establish a domain and safe learning methods, and codesigned a card game focused on food knowledge. The cards contain images of plant and animal foods, and facilitate teaching and learning of food practices. We conducted an evaluation of the cards and how people use them, and this revealed how the cards help to create a safe space for learning, and how they are flexible as to the domains of knowledge people wish to maintain. The evaluation also established that this approach is effective for supporting knowledge transmission in the interests of language maintenance and revitalization.Item type: Item , Comparing language-specific and cross-language acoustic models for low-resource phonetic forced alignment(University of Hawaii Press, 2025-07) Eleanor Chodroff; Emily P. Ahn; Hossep DolatianPhonetic forced alignment can greatly expedite spoken language analysis by providing automatic time alignments at the word and phone levels. In the case of low-resource languages, it remains an open question whether phone-level forced alignment will be more successful with a small language-specific acoustic model or a high-resource cross-language acoustic model. The present study directly compared the forced alignment performance of language-specific and cross-language acoustic models using the Urum and Evenki datasets from the DoReCo Corpus. We evaluated six language-specific acoustic models trained with 5, 10, 15, 20, 25, or approximately 70 minutes of language-specific speech data against four English-based cross-language acoustic models that differed in size and accent homogeneity (large Global English or homogeneous American English of varying data amounts). Acoustic models were developed or obtained from the Montreal Forced Aligner and evaluated against held-out manually aligned phone boundaries. Overall, the Global English model and the larger language-specific acoustic models were competitive with one another and outperformed the homogeneous cross-language and smaller language-specific acoustic models. From this analysis, we recommend that researchers use a language-specific model with at least 25 minutes of actual speech (not just recording duration) or a large, diverse cross-language acoustic model for low-resource forced alignment.Item type: Item , Upper Faifi as an Endangered Arabic Variety: A Philological, Descriptive, and Acoustic Study(University of Hawaii Press, 2025-06) Essa Alfaifi; Yahya Aldholmi; Tyler Lee; Jaycie Ryrholm Martin; Sarah MountainThe Faifi variety, classified as an Arabic dialect (albeit with controversy), is chiefly spoken in southwestern Saudi Arabia by a diminishing number of autochthonous Faifi people. This dialect has been the subject of both synchronic and diachronic description and analysis in a few published studies and a small number of unpublished works. While some studies have focused on very narrow aspects of Faifi, such as the phoneme /st/, others have taken a broad scope and attempted either to address an entire linguistic level, such as syntax or phonology, or to touch upon different linguistic levels in one study. To date, however, no study has acoustically documented the dialect. Thus, the current paper intends to complement previous studies and further elaborate on the distinction between two Faifi subvarieties that existing scholarship seems to have overlooked: Upper and Lower Faifi. We detail an acoustic description of Upper Faifi consonants and vowels and their interaction and provide auditory materials to support our acoustic analysis to contribute a milestone description of this variety and to help future researchers obtain access to this minority of speakers.Item type: Item , Flibl: A Tool to Ease Text Transfer Between ELAN and FLEx(University of Hawaii Press, 2025-04) Amalia Skilton; Sunkulp Ananthanarayan; Sofia Gottlieb Pierson; Claire BowernTwo of the most common software tools in language documentation are ELAN, for transcription, and FieldWorks Language Explorer (FLEx), for interlinearization. Many language documentarians use these tools together, and the transcribed output of ELAN is natural input to FLEx. Despite this, out of the box the two programs are not effectively interoperable. FLEx also does not allow users to display many data structures that are visible in ELAN and necessary for research purposes, such as speaker attributions. Therefore, we created Flibl [flɪbɫ]̩ , a software tool that automatically converts between the data formats used by ELAN and FLEx while keeping all ELAN information visible. This article offers a description and tutorial on the software. First, we describe our research motivations for creating Flibl, how researchers can use it and for what topics, and how the software works on the backend. Readers interested in using Flibl can download it from our stable repository at https://github.com/amaliaskilton/flibl.Item type: Item , The Corpus of Spoken Yiddish in Europe: Goals, Methods, and Applications(University of Hawaii Press, 2025-03) Isaac L. Bleaman; Chaya R. NoveWe introduce the Corpus of Spoken Yiddish in Europe (CSYE), an Open Access digital language archive based on several hundred testimony interviews with Holocaust survivors from the USC Shoah Foundation. The testimonies are a uniquely rich source of information on all aspects of European Yiddish: its regional dialects, grammatical structures, registers and styles, prosody, cospeech gestures, and other topics. Because the survivors represent a socially and geographically diverse cross-section of Yiddish-speaking society, their testimonies are an invaluable resource on the language as it was transmitted from generation to generation before the genocide of European Jewry. This article outlines the CSYE development workflow and highlights use cases for its materials in linguistic research and other domains.Item type: Item , River walks, reef dives, and Rapid Word Collection: Documenting linguistic and environmental knowledge with interdisciplinary and collaborative methods(University of Hawaii Press, 2025-05) Christine Schreyer; Ken Longenecker; John WagnerThe lives of Kala people are deeply intertwined with marine and riparian environments on which their livelihoods depend. To document linguistic and environmental knowledge in these communities, as part of our communitybased, collaborative language documentation project, we utilized three main interdisciplinary documentation methods: (1) River walks, (2) Marine interviews, utilizing videos from SCUBA dives at local reefs, and (3) modified Rapid Word Collection workshops. About 2000 Kala-speakers live in six coastal villages in Morobe Province, Papua New Guinea. Our research took us to the three southern Kala-speaking villages in 2017 and the three northern Kala-speaking villages in 2019. Here, we discuss the importance of combining these three methods. As our project was highly interdisciplinary, including scholars of anthropology, biology, and linguistics, as well as Kala community researchers and knowledge experts, methods to capture multiple types of knowledge were essential to building the broadest data set possible. For instance, our inclusion of the river and marine environments provided us with interesting comparisons between the semantic categories of words in English versus Kala. Finally, we discuss how our revised versions of these methods, which we utilized in 2019, enabled us to improve overall workflow through audio-video syncing, transcription, and the reduction of electronic storage requirements.Item type: Item , Bringing psycholinguistics to the field: Experiences from Solomon Islands(University of Hawaii Press, 2025-03) Åshild Næss; Sebastian SauppeThe world’s linguistic diversity is severely underrepresented in research on cognitive and neural aspects of language processing, with great consequences for our understanding of the relationship between language, cognition, and the human brain. The practical challenges of carrying out neurophysiological (but also behavioral) experiments under fieldwork conditions is one factor that contributes to this lack of diversity, and meeting them necessarily requires the integration of experimental work in a larger descriptive and documentary context. This paper discusses these challenges and how they may be met, based on the authors’ experiences in carrying out an EEG study on sentence comprehension in Solomon Islands. It argues that reconciling the requirements of experimental studies with those of working with speech communities in the field is certainly challenging, but can be achieved with coordination and a realistic assessment of the resources required. Moreover, while field-based experimental research should not compete with descriptive and documentary linguistic work as a means of supporting a community in maintaining and developing their language, it can be beneficial in promoting a sense of the value of the language that is not based on its status as endangered, but rather on its specific linguistic features that contribute to insight into human language more generally.Item type: Item , The Documentation of Chedungun and the Pewenche Highlands: Phase One(University of Hawaii Press, 2025-03) Pablo Fuentes; Sonia Vita-ManquepiThis article provides a descriptive guide to the documentation of Chedungun, the regional variant of Mapudungun (ISO 639-2 code arn) that is spoken by the Pewenche people. The 15-hour documentation is currently deposited in the Endangered Language Archive (ELAR) and corresponds to Phase One of a long-term initiative that is currently progressing to a postdoctoral project (Phase Two). Both phases are supported and funded by the Endangered Languages Documentation Programme. Since the objective of the project is to document the endangered migratory lifestyle and language of the Pewenche people, we will reflect on how the territorial inaccessibility imposed by the COVID-19 pandemic challenged the project’s elemental strategy, which relied on several documentary journeys to the lands that are seasonally occupied by the Pewenche during the summer for transhumance purposes. We will show why the collaborative workflow sustained by self-documentation practices evolved from an auxiliary tool to a regular and essential element of the team’s current and future projects.Item type: Item , Contextual clips: Prioritizing neglected recordings in corpora(University of Hawaii Press, 2025-03) Samantha Rarrick; Reza ArabWe collaborated to investigate humor in the existing corpus of Kere (ISO639-3: sst). This collaboration was a useful test of the Kere corpus and led to the rediscovery of unarchived video recordings, which contained important contextual information. These videos had been deprioritized in the original deposit, but they contained important information that could be used as both data and metadata. We propose the term contextual clips for incidentally-collected recordings which have been deprioritized in some way. Contextual clips may be more naturalistic and offer an effective way to supplement written metadata and other contextual information. Our experience investigating humor also revealed that collaboration as a process can serve as a means to test a corpus. Working across disciplines helped identify future user needs, such as missing contextual information that may not be obvious to a researcher familiar with the corpus. Collaborative research may thus be an elegant solution to some of the known issues in mobilizing corpora. We encourage other researchers who manage corpora to identify contextual clips they may have, evaluate why the files were deprioritized originally, and to consult with communities on how to manage individual files.Item type: Item , Lang*Reg corpus: Documenting intraspeaker variation across languages and registers(University of Hawaii Press, 2025-03) Nico Lehmann; Vahid Mortezapour; Jozina Vander Klok; Zahra Farokhnejad; David Müller; Elisabeth Verhoeven; Aria AdliWe present a new corpus design for multi-lingual corpora that involve intra-speaker variation in different situational-functional contexts, including primarily spoken but also the written mode, with the aim towards enhancing language documentation efforts and resources. We illustrate how this comparative design and the resulting cross-culturally applicable data collection procedure has been successfully realized in order to build the Lang*Reg corpus (Adli et. al. 2024), which currently includes five languages from three different language families: German, Persian, Southern Kurdish, Yucatec Maya and Javanese. For each of these languages, the same native speakers were asked to produce language in two types of activities that naturally occur in all the respective cultural contexts: telling a story to a friend, and talking freely with various interlocutors (friend, stranger, taxi driver, university professor). Moreover, our design included the storytelling in two modes, which allows for the comparison between spoken and written modes of the same language user. We show how Lang*Reg provides a versatile resource for many purposes – in particular research into register due to the variety of situational contexts involved, we show how German and Persian exploit the right periphery for different register distinctions, and we invite others to use this resource. At the same time, we show how the methodology developed can be used as a template to complement language resources by creating comparable intra-individual, multi-purpose data sets.Item type: Item , Bridging Signed Language Documentation & Spoken Language Documentation(University of Hawaii Press, 2025-03) Samantha RarrickThe field of language documentation continues to grow, but an historic split between sign language documentation and spoken language documentation persists. In order to fully understand the linguistic context within a community, it can be necessary to overcome this split by designing language documentation projects to address threatened and unreported languages across modalities. Additionally, these two subfields can lend insights to the other both with respect to the analysis of individual languages and best practices for language documentation Drawing on an example of parallel projects to document and describe a spoken language and signed language of Papua New Guinea, this paper provides recommendations for researchers in similar situations. Benefits and practicalities of team-based research and extensive use of video recordings are discussed as essential for creating holistic language documentation with outcomes which are useful and appropriate for an entire community. Because many endangered and minority spoken languages are used in areas where there is little existing knowledge and documentation of signed languages, this situation is unlikely to be uncommon and this type of work has potential to further sign language linguistics, typology, and best practices for language documentation across modalities.Item type: Item , Polar questions in Arapaho: Confronting challenges of documentation and description(University of Hawaii Press, 2025-03) Andrew Cowell; Chase Wesley Raymond; Maisa NammariThis paper examines polar questions in Arapaho, from several perspectives. First, examples are given of consultants’ elicited Arapaho glosses for Englishlanguage questions, along with consultant commentary and language ideologies on the proper forms. Of note is the consultants’ preference for negative polar questions. Next, a series of native-speaker-produced bilingual curricular materials are examined, considered as idealized question models outside the context of a specific focus on polar questions. Again, negative versus positive polar questions are compared. Following is a study of polar questions in conversation. Finally, we offer a close examination of one variety of polar questions – requests for verification – using a conversationanalytic approach to examine positive and negative polar questions. The speakers’ generalized ideologies about the preferability of negative polar questions are not supported by the corpus data. However, the data show that there is a preference for negative polar questions in situations where otherwise dispreferred responses may occur. Secondarily, use of negative questions is linked to politeness behaviors, where allowance for a dispreferred or negative response is a key feature of question construction. The study then notes that consultants seem to interpret elicited polar questions as potentially referring to very generalized public audiences. Without overtly saying so, they default to the “safest” and most “polite” usage – negative polar questions. The paper illustrates the challenges of elicitation in the domain of pragmatics, the varying outcomes one gets from variable data sources and methodologies, and the value of micro-level and multi-modal analysis of natural conversation.
