Syntactic complexity and its development in early learners of Mandarin

dc.contributor.advisor Gilliland, Elizabeth (Betsy)
dc.contributor.author DeVore, Susanne
dc.contributor.department Second Language Studies
dc.date.accessioned 2022-07-05T19:58:56Z
dc.date.available 2022-07-05T19:58:56Z
dc.date.issued 2022
dc.description.degree Ph.D.
dc.identifier.uri https://hdl.handle.net/10125/102277
dc.subject Linguistics
dc.subject complexity
dc.subject learner corpus research
dc.subject Mandarin
dc.subject network science
dc.subject quantitative
dc.subject Second language development
dc.title Syntactic complexity and its development in early learners of Mandarin
dc.type Thesis
dcterms.abstract The first part of this dissertation conducts a large-scale study comparing general and usage-based indices of syntactic complexity in L2 Mandarin writing samples across a wide range of proficiency levels. To do this, the Tool for the Automated Analysis of Syntactic Sophistication and Complexity (TAASSC; Kyle, 2016) was adapted for Mandarin by changing how the syntactic relations were identified and by developing a large (apx 30-million word) Chinese language reference corpus. This tool was used to identify and tally both general and usage-based indices of proficiency in a large, publicly available learner corpus. Because learning may not be a linear process, linear and polynomial multiple regression models were built for each type of index. These were compared for general and usage-based indices respectively. For usage-based indices, the linear model best fit the data while for general indices, the polynomial model was the best fit. These two models were then compared to each other, and the usage-based model was found to be significantly better at predicting variance in the data. Finally, all significant indices were combined into a single model to see if there might be overlap between the two types of index and it found that there may be some, but not complete, overlap between the two. The results of the first part of the dissertation indicate that usage-based indices explain more variance in the data than general indices. In addition to that, they are also preferable for other reasons: They predict writing proficiency linearly across levels, are ecologically valid, can be used to compare complexity cross-linguistically, and are consistent with other research on Mandarin that focuses on topic-comment structures. Based on the results detailed in Chapter 3, Chapter 4 turns to the processes of development. However, there are some additional challenges to understanding development that must be considered. First, current usage-based indices mainly focus on verb-VAC relationships or on words in phrases, but do not take into account the simultaneous development of constructions at the lexical, phrasal, and clausal levels. Further complicating the analysis, these constructions are embedded within each other. Finally, a single lexical or phrasal construction can be used in multiple contexts so there is also overlap between the constructions. These challenges add complexity to the analysis and so Network Science is used to combine usage-based approaches with those based in complex dynamic systems theory. This allows for a more holistic analysis that captures changes at and across lexical, phrasal, and clausal levels. In this dissertation, five macro-level (or full network) indices were first analyzed because they are theoretically related to the cognitive processes associated with language learning. These were analyzed longitudinally in two early learners of Mandarin and cross-sectionally using the same large-scale corpus and statistical methods used in Chapter 3. Average betweenness (or degree to which nodes act as bridges across subsystems) and average degrees by type (or degree of schematization) both showed clear linear trends in the longitudinal data and were significant linear predictors of proficiency across a wide range of proficiency levels. A model that combined these network science indices with the previously identified significant usage-based and general indices indicates that network science indices add an additional 10% of variance explained. Network science, usage-based, and general indices were all included in the final model, indicating that all three target different aspects of complexity. These two indices raise additional questions that are also explored in Chapter 4. Average betweenness is often described in the literature as the degree to which a node bridges different subsystems, which raises the question: What subsystems? Subsystems are identified quantitatively using community structure and then qualitatively analyzed to determine what (if any) linguistic features they represent. The changes in these structures are then traced over time to analyze change in the learner’s linguistic system. Finally, we would expect that average degrees (or degree of schematization) would be related to the target language. In order to analyze this, a small target language corpus was developed based on the textbook by participants in the longitudinal study. The learner and target language networks are then compared. There are several key findings of this dissertation with implications for research, instruction, and assessment. The first part of the dissertation indicates that usage-based indices are better than general measures as predictors of proficiency in L2 Mandarin writing across a wide range of proficiency levels. This is consistent with current complexity research on topic-comment structures, since both are based on the features of the target language. The second part of the dissertation explores & introduces Network Science as a framework for analyzing the development of syntactic structures in L2 writing. It first identifies betweenness as a new and effective macro-level measure of proficiency and development. While this has no parallel within other frameworks, it can be conceptualized as the degree of integration of words across different linguistic constructions and is related to Langacker's (1999) proposal that knowing a construction in one context should facilitate the acquisition of related constructions in other contexts. Moving to the meso-level, or phrasal/clausal level this dissertation identifies community structure as an effective way to identify the constructions that emerge. These community structures can then be used to analyze how structures develop and change over time, as well as the nature of that change over time. Finally, although the outcomes in this study were somewhat unexpected, this dissertation also shows how the learner’s production and target language networks can be used to compare the relationship between what the learner produces and the target language. The findings of the second part of the study indicate that network science can be used to combine usage-based theories of language acquisition and complex dynamic systems theory to explain development in a more holistic way than either has done independently.
dcterms.extent 189 pages
dcterms.language en
dcterms.publisher University of Hawai'i at Manoa
dcterms.rights All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
dcterms.type Text
local.identifier.alturi http://dissertations.umi.com/hawii:11356
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
DeVore_hawii_0085A_11356.pdf
Size:
7.62 MB
Format:
Adobe Portable Document Format
Description: