Evaluating Topic Models with OpenAI Embeddings: A Comparative Analysis on Variable-Length Texts Using Two Datasets

Date

2025-01-07

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Volume

Number/Issue

Starting Page

1571

Ending Page

Alternative Title

Abstract

Topic modeling is a crucial unsupervised machine learning technique for identifying themes within unstructured text. This study compares traditional topic modeling methods, like Latent Dirichlet Allocation (LDA), against advanced embedding-based models, specifically BERTopic-OpenAI. The analysis utilizes two distinct datasets: user reviews from the mental health app Replika and the 20newsgroup dataset. For the Replika dataset, both methods identified common themes, but BERTopic-OpenAI uncovered additional nuanced topics, demonstrating its enhanced semantic capabilities. Quantitative evaluation of the 20newsgroup dataset further highlighted BERTopic-OpenAI's advantage through achieving higher topic coherence and diversity than the best-performing LDA model. These results suggest that embedding-based models provide more coherent, interpretable, and diverse topics, making them valuable tools for extracting meaningful insights from extensive and variable-length text corpora. Future research should focus on refining these advanced techniques to improve their applicability and effectiveness in dynamic and varied textual environments.

Description

Keywords

Natural Language Processing and Large Language Models Supporting Data Analytics for System Sciences, coherence, diversity, embeddings, interpretability, openai, topic models

Citation

Extent

10

Format

Geographic Location

Time Period

Related To

Proceedings of the 58th Hawaii International Conference on System Sciences

Related To (URI)

Table of Contents

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Rights Holder

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.