Comparing language-specific and cross-language acoustic models for low-resource phonetic forced alignment

Loading...
Thumbnail Image

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Interviewee

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

University of Hawaii Press

Volume

19

Number/Issue

Starting Page

201

Ending Page

223

Alternative Title

Abstract

Phonetic forced alignment can greatly expedite spoken language analysis by providing automatic time alignments at the word and phone levels. In the case of low-resource languages, it remains an open question whether phone-level forced alignment will be more successful with a small language-specific acoustic model or a high-resource cross-language acoustic model. The present study directly compared the forced alignment performance of language-specific and cross-language acoustic models using the Urum and Evenki datasets from the DoReCo Corpus. We evaluated six language-specific acoustic models trained with 5, 10, 15, 20, 25, or approximately 70 minutes of language-specific speech data against four English-based cross-language acoustic models that differed in size and accent homogeneity (large Global English or homogeneous American English of varying data amounts). Acoustic models were developed or obtained from the Montreal Forced Aligner and evaluated against held-out manually aligned phone boundaries. Overall, the Global English model and the larger language-specific acoustic models were competitive with one another and outperformed the homogeneous cross-language and smaller language-specific acoustic models. From this analysis, we recommend that researchers use a language-specific model with at least 25 minutes of actual speech (not just recording duration) or a large, diverse cross-language acoustic model for low-resource forced alignment.

Description

Keywords

Citation

Chodroff, Eleanor, Emily P. Ahn, Hossep Dolatian. 2025. Comparing language-specific and cross-language acoustic models for low-resource phonetic forced alignment. Language Documentation & Conservation 19: 201-223.

DOI

Extent

23

Format

Article

Geographic Location

Time Period

Related To

Related To (URI)

Table of Contents

Rights

Creative Commons Attribution-NonCommercial 4.0 International

Rights Holder

Catalog Record

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.