Comparison of Latent Dirichlet Modeling and Factor Analysis for Topic Extraction: A Lesson of History

dc.contributor.authorPeladeau, Normand
dc.contributor.authorDavoodi, Elnaz
dc.date.accessioned2017-12-28T00:38:42Z
dc.date.available2017-12-28T00:38:42Z
dc.date.issued2018-01-03
dc.description.abstractTopic modeling is often perceived as a relatively new development in information retrieval sciences, and new methods such as Probabilistic Latent Semantic Analysis and Latent Dirichlet Allocation have generated a lot of research. However, attempts to extract topics from unstructured text using Factor Analysis techniques can be found as early as the 1960s. This paper compares the perceived coherence of topics extracted on three different datasets using Factor Analysis and Latent Dirichlet Allocation. To perform such a comparison a new extrinsic evaluation method is proposed. Results suggest that Factor Analysis can produce topics perceived by human coders as more coherent than Latent Dirichlet Allocation and warrant a revisit of a topic extraction method developed more than fifty-five years ago, yet forgotten.
dc.format.extent9 pages
dc.identifier.doi10.24251/HICSS.2018.078
dc.identifier.isbn978-0-9981331-1-9
dc.identifier.urihttp://hdl.handle.net/10125/49965
dc.language.isoeng
dc.relation.ispartofProceedings of the 51st Hawaii International Conference on System Sciences
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectText Mining in Big Data Analytics
dc.subjectFactor Analysis, Latent Dirichlet Allocation, Text Mining, Topic Modeling
dc.titleComparison of Latent Dirichlet Modeling and Factor Analysis for Topic Extraction: A Lesson of History
dc.typeConference Paper
dc.type.dcmiText

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
paper0078.pdf
Size:
374.82 KB
Format:
Adobe Portable Document Format