Integrating Automatic Transcription into the Language Documentation Workflow: Experiments with Na Data and the Persephone Toolkit

Date

2018-09

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

University of Hawaii Press

Volume

Number/Issue

Starting Page

481

Ending Page

513

Alternative Title

Abstract

Automatic speech recognition tools have potential for facilitating language documentation, but in practice these tools remain little-used by linguists for a variety of reasons, such as that the technology is still new (and evolving rapidly), user-friendly interfaces are still under development, and case studies demonstrating the practical usefulness of automatic recognition in a low-resource setting remain few. This article reports on a success story in integrating automatic transcription into the language documentation workflow, specifically for Yongning Na, a language of Southwest China. Using Persephone, an open-source toolkit, a single-speaker speech transcription tool was trained over five hours of manually transcribed speech. The experiments found that this method can achieve a remarkably low error rate (on the order of 17%), and that automatic transcriptions were useful as a canvas for the linguist. The present report is intended for linguists with little or no knowledge of speech processing. It aims to provide insights into (i) the way the tool operates and (ii) the process of collaborating with natural language processing specialists. Practical recommendations are offered on how to anticipate the requirements of this type of technology from the early stages of data collection in the field.

Description

Keywords

language documentation, automatic speech transcription, automatic speech recognition, natural language processing, endangered languages, sound archive, multimedia corpora, interdisciplinarity, open-source software, open access

Citation

Michaud, Alexis, Oliver Adams, Trevor Anthony Cohn, Graham Neubig & Séverine Guillaume. 2018. Integrating Automatic Transcription into the Language Documentation Workflow: Experiments with Na Data and the Persephone Toolkit. Language Documentation & Conservation 12. 393-429.

Extent

37 pages

Format

Geographic Location

Time Period

Related To

Related To (URI)

Table of Contents

Rights

Creative Commons Attribution-NonCommercial 4.0 International
Attribution-NonCommercial 3.0 United States

Rights Holder

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.