Comparing language-specific and cross-language acoustic models for low-resource phonetic forced alignment

dc.creatorEleanor Chodroff
dc.creatorEmily P. Ahn
dc.creatorHossep Dolatian
dc.date.accessioned2025-07-31T23:39:14Z
dc.date.available2025-07-31T23:39:14Z
dc.date.copyright2025
dc.date.issued2025-07
dc.description.abstractPhonetic forced alignment can greatly expedite spoken language analysis by providing automatic time alignments at the word and phone levels. In the case of low-resource languages, it remains an open question whether phone-level forced alignment will be more successful with a small language-specific acoustic model or a high-resource cross-language acoustic model. The present study directly compared the forced alignment performance of language-specific and cross-language acoustic models using the Urum and Evenki datasets from the DoReCo Corpus. We evaluated six language-specific acoustic models trained with 5, 10, 15, 20, 25, or approximately 70 minutes of language-specific speech data against four English-based cross-language acoustic models that differed in size and accent homogeneity (large Global English or homogeneous American English of varying data amounts). Acoustic models were developed or obtained from the Montreal Forced Aligner and evaluated against held-out manually aligned phone boundaries. Overall, the Global English model and the larger language-specific acoustic models were competitive with one another and outperformed the homogeneous cross-language and smaller language-specific acoustic models. From this analysis, we recommend that researchers use a language-specific model with at least 25 minutes of actual speech (not just recording duration) or a large, diverse cross-language acoustic model for low-resource forced alignment.
dc.description.sponsorshipNational Foreign Language Resource Center
dc.formatArticle
dc.format.extent23
dc.identifier.citationChodroff, Eleanor, Emily P. Ahn, Hossep Dolatian. 2025. Comparing language-specific and cross-language acoustic models for low-resource phonetic forced alignment. Language Documentation & Conservation 19: 201-223.
dc.identifier.issn1934-5275
dc.identifier.urihttps://hdl.handle.net/10125/74817
dc.languageeng
dc.publisherUniversity of Hawaii Press
dc.titleComparing language-specific and cross-language acoustic models for low-resource phonetic forced alignment
dcterms.rightsCreative Commons Attribution-NonCommercial 4.0 International
dcterms.typeText
prism.endingpage223
prism.publicationnameLanguage Documentation & Conservation
prism.startingpage201
prism.volume19

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chodroff_etal_2025.pdf
Size:
1.65 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.73 KB
Format:
Item-specific license agreed upon to submission
Description: