LSI and DBSCAN: Natural language processing for sociolinguistic analysis

dc.contributor.authorCollard, Jacob
dc.contributor.speakerCollard, Jacob
dc.date.accessioned2015-03-12T20:33:43Z
dc.date.available2015-03-12T20:33:43Z
dc.date.begin2015-02-28
dc.date.finish2015-02-28
dc.date.issued2015-03-12
dc.descriptionThe issue of analyzing sociolinguistic and anthropological information remains an open question in contemporary social sciences. Though statistical analyses are possible to identify and quantify correlations and other relationships, it is much more difficult to examine qual- itative information, including descriptions of sociolinguistic contexts such as those found in the Endangered Languages Catalog (The Linguist List at Eastern Michigan University and The University of Hawaii at M ̄anoa, 2012). By introducing natural language processing tech- niques such as LSI, or latent semantic analysis, it becomes possible to quantify sociolinguistic descriptions to a certain degree. By quantifying natural language semantics, analysis of sociolinguistics becomes less sub- jective, though the analysis is still performed on descriptions generated by humans. Further- more, when combined with document clustering techniques such as DBSCAN (Kriegel et al., 2011) natural language processing also allows for the possibility of recognizing relationships between disparate languages hitherto overlooked. Because of the speed and breadth of this algorithm, it can recognize the relationships between any languages, regardless of geographic or genetic distance. This can provide insights into the effectiveness of different conservation techniques and language policies, as descriptions of these parameters are commonly found in natural language publications. References Hans-Peter Kriegel, Peer Kroger, Jorg Sander, and Arthur Zimek. Density-based clustering. WIREs Data Mining and Knowledge Discovery, 1(3):231–240, 2011. The Linguist List at Eastern Michigan University and The University of Hawaii at Mānoa. Endangered languages, 2012. URL http://www.endangeredlanguages.com.
dc.identifier.urihttp://hdl.handle.net/10125/25326
dc.rightsCreative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
dc.titleLSI and DBSCAN: Natural language processing for sociolinguistic analysis

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
25326.mp3
Size:
52.69 MB
Format:
Moving Picture Experts Group Layer-3 Audio