Universals of reference in discourse and grammar: Evidence from the Multi-CAST collection of spoken corpora

dc.contributor.author Haig, Geoffrey
dc.contributor.author Schnell, Stefan
dc.contributor.author Schiborr, Nils Norman
dc.date.accessioned 2022-01-24T19:37:25Z
dc.date.available 2022-01-24T19:37:25Z
dc.date.issued 2021
dc.description.abstract Data from under-researched languages are now available in sufficient quantity and quality to feed into corpus-based approaches to language typology. In this paper we present Multi-CAST (Multilingual Corpus of Annotated Spoken Texts), a project designed to facilitate cross-linguistic comparison of naturalistic discourse across typologically diverse languages, which implements a purpose-built shared annotation scheme. After sketching the rationale and architecture of Multi-CAST, we illustrate the efficacy of the method with two case-studies: The first one investigates the rates of lexical (as opposed to pronominal and zero) realization of arguments in discourse across a sample of 15 typologically diverse languages. Our results reveal a remarkable and hitherto unnoticed uniformity in the density of lexical references, despite the lack of content control in the corpora. The second addresses the question of whether cross-linguistically attested regularities in morphosyntax can meaningfully be related to frequency effects in discourse. We find some support for frequency-based explanations, but our data also show that the frequency accounts leave several key questions unanswered. Overall, our findings underscore that research based on language documentation-derived corpus data, and in particular spoken language data, is not only possible, but in fact crucially necessary for testing frequency-based explanations, because these data stem from spoken language and typologically diverse languages. We also identify a number of epistemological and methodological shortcomings with our approach, and discuss some of the requirements for further innovation in areas of corpus building, corpus annotation, and typological comparability.
dc.identifier.citation Haig, Geoffrey & Schnell, Stefan & Schiborr, Nils N. 2021. Universals of reference in discourse and grammar: Evidence from the Multi-CAST collection of spoken corpora. In Haig, Geoffrey & Schnell, Stefan & Seifart, Frank (eds.), Doing corpus-based typology with spoken language data: State of the art, 141–177. Honolulu, HI: University of Hawai'i Press.
dc.identifier.isbn 978-0-9979673-0-2
dc.identifier.uri http://hdl.handle.net/10125/74660
dc.publisher University of Hawai'i Press
dc.relation.ispartofseries LD&C Special Publication
dc.rights Creative Commons Attribution Non-Commercial Share-Alike Licence
dc.subject corpus-based typology
dc.subject universals of language use
dc.subject discourse structure
dc.subject referential choice
dc.subject marking asymmetries
dc.title Universals of reference in discourse and grammar: Evidence from the Multi-CAST collection of spoken corpora
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LD&C-SP25__5_Haig+Schnell+Schiborr.pdf
Size:
264.09 KB
Format:
Adobe Portable Document Format
Description: