Non-Exhaustive, Overlapping k-medoids for Document Clustering

dc.contributor.authorKerstens, Eric
dc.date.accessioned2020-01-04T07:18:06Z
dc.date.available2020-01-04T07:18:06Z
dc.date.issued2020-01-07
dc.description.abstractManual document categorization is time consuming, expensive, and difficult to manage for large collections. Unsupervised clustering algorithms perform well when documents belong to only one group. However, individual documents may be outliers or span multiple topics. This paper proposes a new clustering algorithm called non-exhaustive overlapping k-medoids inspired by k-medoids and non-exhaustive overlapping k-means. The proposed algorithm partitions a set of objects into k clusters based on pairwise similarity. Each object is assigned to zero, one, or many groups to emulate manual results. The algorithm uses dissimilarity instead of distance measures and applies to text and other abstract data. Neo-k-medoids is tested against manually tagged movie descriptions and Wikipedia comments. Initial results are primarily poor but show promise. Future research is described to improve the proposed algorithm and explore alternate evaluation measures.
dc.format.extent10 pages
dc.identifier.doi10.24251/HICSS.2020.097
dc.identifier.isbn978-0-9981331-3-3
dc.identifier.urihttp://hdl.handle.net/10125/63836
dc.language.isoeng
dc.relation.ispartofProceedings of the 53rd Hawaii International Conference on System Sciences
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectText Analytics
dc.subjectdisjunctive
dc.subjectdocument clustering
dc.subjectoutlier detection
dc.subjectoverlapping
dc.titleNon-Exhaustive, Overlapping k-medoids for Document Clustering
dc.typeConference Paper
dc.type.dcmiText

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0078.pdf
Size:
316.7 KB
Format:
Adobe Portable Document Format

Collections