Please use this identifier to cite or link to this item: http://hdl.handle.net/10125/49965

Comparison of Latent Dirichlet Modeling and Factor Analysis for Topic Extraction: A Lesson of History

File Size Format  
paper0078.pdf 374.82 kB Adobe PDF View/Open

Item Summary

Title:Comparison of Latent Dirichlet Modeling and Factor Analysis for Topic Extraction: A Lesson of History
Authors:Peladeau, Normand
Davoodi, Elnaz
Keywords:Text Mining in Big Data Analytics
Factor Analysis, Latent Dirichlet Allocation, Text Mining, Topic Modeling
Date Issued:03 Jan 2018
Abstract:Topic modeling is often perceived as a relatively new development in information retrieval sciences, and new methods such as Probabilistic Latent Semantic Analysis and Latent Dirichlet Allocation have generated a lot of research. However, attempts to extract topics from unstructured text using Factor Analysis techniques can be found as early as the 1960s. This paper compares the perceived coherence of topics extracted on three different datasets using Factor Analysis and Latent Dirichlet Allocation. To perform such a comparison a new extrinsic evaluation method is proposed. Results suggest that Factor Analysis can produce topics perceived by human coders as more coherent than Latent Dirichlet Allocation and warrant a revisit of a topic extraction method developed more than fifty-five years ago, yet forgotten.
Pages/Duration:9 pages
URI:http://hdl.handle.net/10125/49965
ISBN:978-0-9981331-1-9
DOI:10.24251/HICSS.2018.078
Rights:Attribution-NonCommercial-NoDerivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/
Appears in Collections: Text Mining in Big Data Analytics


Please email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.

This item is licensed under a Creative Commons License Creative Commons