Developments in SayMore: The language documentation tool for citizen scientists

Moeller, Sarah
Journal Title
Journal ISSN
Volume Title
Citizen scientists (who may be speakers of endangered languages) who want to document a language may conclude that the current bar for entry is too high. What to archive? How much metadata is needed? In fact, what is metadata? They need background training to know that the process of documentation must include translations, transcriptions, and metadata, as well as original recordings. But managing the ever-increasing data can be an insurmountable hurdle if done by hand. If this hurdle is passed, their corpus still needs to comply with the archive’s preferences before being deposited. SayMore is a software tool that allows citizen scientists to engage in language documentation without being hindered by the language of metadata, scared off by data management, or intimidated by the archiving process. SayMore knows the workflow of language documentation and guides the user through the steps, providing support for transcription and translations, and displaying progress charts that summarize the state of the whole corpus. First, SayMore elicits needed metadata through forms. Pre-defined fields prompt users to gather needed metadata by filling in the blanks. This means the user does not have to research what metadata is needed, or understand how to comply with metadata standards such as OLAC or IMDI, or how to generate XML files. Second, SayMore takes care of data management without distracting the user with technical details. Bundles of files are grouped into folders, and files are automatically renamed to reduce broken links due to typos. Third, SayMore automates the deposit of a well-formed corpus to an archive. With the click of a button, users can create a corpus package formatted to an archive’s specifications. The resulting deposits will enrich the accessibility of materials for language conservation and pedagogy. SayMore could and should allow any archive to receive well-formed corpora from any user, but for this, feedback is needed from more archives and users. This presentation will encourage dialogue about how this tool can be advanced. After describing how SayMore addresses these issues, the presentation will briefly describe a case study where the author used SayMore to build corpora of two endangered languages in the Caucasus Mountains. Over 500 recordings had been stored in random locations. Four people with no background in language documentation became involved by accomplishing easy tasks on SayMore such as filling in metadata forms.
Access Rights
Email if you need this content in ADA-compliant format.