Please use this identifier to cite or link to this item: http://hdl.handle.net/10125/25326

LSI and DBSCAN: Natural language processing for sociolinguistic analysis

File SizeFormat 
25326.mp353.95 MBMP3View/Open

Item Summary

Title: LSI and DBSCAN: Natural language processing for sociolinguistic analysis
Issue Date: 12 Mar 2015
Description: The issue of analyzing sociolinguistic and anthropological information remains an open question in contemporary social sciences. Though statistical analyses are possible to identify and quantify correlations and other relationships, it is much more difficult to examine qual- itative information, including descriptions of sociolinguistic contexts such as those found in the Endangered Languages Catalog (The Linguist List at Eastern Michigan University and The University of Hawaii at M ̄anoa, 2012). By introducing natural language processing tech- niques such as LSI, or latent semantic analysis, it becomes possible to quantify sociolinguistic descriptions to a certain degree.

By quantifying natural language semantics, analysis of sociolinguistics becomes less sub- jective, though the analysis is still performed on descriptions generated by humans. Further- more, when combined with document clustering techniques such as DBSCAN (Kriegel et al., 2011) natural language processing also allows for the possibility of recognizing relationships between disparate languages hitherto overlooked. Because of the speed and breadth of this algorithm, it can recognize the relationships between any languages, regardless of geographic or genetic distance. This can provide insights into the effectiveness of different conservation techniques and language policies, as descriptions of these parameters are commonly found in natural language publications.

References

Hans-Peter Kriegel, Peer Kroger, Jorg Sander, and Arthur Zimek. Density-based clustering. WIREs Data Mining and Knowledge Discovery, 1(3):231–240, 2011.

The Linguist List at Eastern Michigan University and The University of Hawaii at Mānoa. Endangered languages, 2012. URL http://www.endangeredlanguages.com.
URI/DOI: http://hdl.handle.net/10125/25326
Rights: Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
Appears in Collections:4th International Conference on Language Documentation and Conservation (ICLDC)



Items in ScholarSpace are protected by copyright, with all rights reserved, unless otherwise indicated.