Non-Exhaustive, Overlapping k-medoids for Document Clustering

Kerstens, Eric

Non-Exhaustive, Overlapping k-medoids for Document Clustering

dc.contributor.author	Kerstens, Eric
dc.date.accessioned	2020-01-04T07:18:06Z
dc.date.available	2020-01-04T07:18:06Z
dc.date.issued	2020-01-07
dc.description.abstract	Manual document categorization is time consuming, expensive, and difficult to manage for large collections. Unsupervised clustering algorithms perform well when documents belong to only one group. However, individual documents may be outliers or span multiple topics. This paper proposes a new clustering algorithm called non-exhaustive overlapping k-medoids inspired by k-medoids and non-exhaustive overlapping k-means. The proposed algorithm partitions a set of objects into k clusters based on pairwise similarity. Each object is assigned to zero, one, or many groups to emulate manual results. The algorithm uses dissimilarity instead of distance measures and applies to text and other abstract data. Neo-k-medoids is tested against manually tagged movie descriptions and Wikipedia comments. Initial results are primarily poor but show promise. Future research is described to improve the proposed algorithm and explore alternate evaluation measures.
dc.format.extent	10 pages
dc.identifier.doi	10.24251/HICSS.2020.097
dc.identifier.isbn	978-0-9981331-3-3
dc.identifier.uri	http://hdl.handle.net/10125/63836
dc.language.iso	eng
dc.relation.ispartof	Proceedings of the 53rd Hawaii International Conference on System Sciences
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	Text Analytics
dc.subject	disjunctive
dc.subject	document clustering
dc.subject	outlier detection
dc.subject	overlapping
dc.title	Non-Exhaustive, Overlapping k-medoids for Document Clustering
dc.type	Conference Paper
dc.type.dcmi	Text

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 0078.pdf
Size:: 316.7 KB
Format:: Adobe Portable Document Format

Download

Collections

Text Analytics