Universals of reference in discourse and grammar: Evidence from the Multi-CAST collection of spoken corpora

Haig, Geoffrey; Schnell, Stefan; Schiborr, Nils Norman

Universals of reference in discourse and grammar: Evidence from the Multi-CAST collection of spoken corpora

dc.contributor.author	Haig, Geoffrey
dc.contributor.author	Schnell, Stefan
dc.contributor.author	Schiborr, Nils Norman
dc.date.accessioned	2022-01-24T19:37:25Z
dc.date.available	2022-01-24T19:37:25Z
dc.date.issued	2021
dc.description.abstract	Data from under-researched languages are now available in sufficient quantity and quality to feed into corpus-based approaches to language typology. In this paper we present Multi-CAST (Multilingual Corpus of Annotated Spoken Texts), a project designed to facilitate cross-linguistic comparison of naturalistic discourse across typologically diverse languages, which implements a purpose-built shared annotation scheme. After sketching the rationale and architecture of Multi-CAST, we illustrate the efficacy of the method with two case-studies: The first one investigates the rates of lexical (as opposed to pronominal and zero) realization of arguments in discourse across a sample of 15 typologically diverse languages. Our results reveal a remarkable and hitherto unnoticed uniformity in the density of lexical references, despite the lack of content control in the corpora. The second addresses the question of whether cross-linguistically attested regularities in morphosyntax can meaningfully be related to frequency effects in discourse. We find some support for frequency-based explanations, but our data also show that the frequency accounts leave several key questions unanswered. Overall, our findings underscore that research based on language documentation-derived corpus data, and in particular spoken language data, is not only possible, but in fact crucially necessary for testing frequency-based explanations, because these data stem from spoken language and typologically diverse languages. We also identify a number of epistemological and methodological shortcomings with our approach, and discuss some of the requirements for further innovation in areas of corpus building, corpus annotation, and typological comparability.
dc.identifier.citation	Haig, Geoffrey & Schnell, Stefan & Schiborr, Nils N. 2021. Universals of reference in discourse and grammar: Evidence from the Multi-CAST collection of spoken corpora. In Haig, Geoffrey & Schnell, Stefan & Seifart, Frank (eds.), Doing corpus-based typology with spoken language data: State of the art, 141–177. Honolulu, HI: University of Hawai'i Press.
dc.identifier.isbn	978-0-9979673-0-2
dc.identifier.uri	http://hdl.handle.net/10125/74660
dc.publisher	University of Hawai'i Press
dc.relation.ispartofseries	LD&C Special Publication
dc.rights	Creative Commons Attribution Non-Commercial Share-Alike Licence
dc.subject	corpus-based typology
dc.subject	universals of language use
dc.subject	discourse structure
dc.subject	referential choice
dc.subject	marking asymmetries
dc.title	Universals of reference in discourse and grammar: Evidence from the Multi-CAST collection of spoken corpora

Files

Original bundle

Now showing 1 - 1 of 1

Name:: LD&C-SP25__5_Haig+Schnell+Schiborr.pdf
Size:: 264.09 KB
Format:: Adobe Portable Document Format

Download

Collections

LD&C Special Publication No. 25: Doing Corpus-Based Typology With Spoken Language Corpora