Please use this identifier to cite or link to this item:
http://hdl.handle.net/10125/63836
Non-Exhaustive, Overlapping k-medoids for Document Clustering
Item Summary
Title: | Non-Exhaustive, Overlapping k-medoids for Document Clustering |
Authors: | Kerstens, Eric |
Keywords: | Text Analytics disjunctive document clustering outlier detection overlapping |
Date Issued: | 07 Jan 2020 |
Abstract: | Manual document categorization is time consuming, expensive, and difficult to manage for large collections. Unsupervised clustering algorithms perform well when documents belong to only one group. However, individual documents may be outliers or span multiple topics. This paper proposes a new clustering algorithm called non-exhaustive overlapping k-medoids inspired by k-medoids and non-exhaustive overlapping k-means. The proposed algorithm partitions a set of objects into k clusters based on pairwise similarity. Each object is assigned to zero, one, or many groups to emulate manual results. The algorithm uses dissimilarity instead of distance measures and applies to text and other abstract data. Neo-k-medoids is tested against manually tagged movie descriptions and Wikipedia comments. Initial results are primarily poor but show promise. Future research is described to improve the proposed algorithm and explore alternate evaluation measures. |
Pages/Duration: | 10 pages |
URI: | http://hdl.handle.net/10125/63836 |
ISBN: | 978-0-9981331-3-3 |
DOI: | 10.24251/HICSS.2020.097 |
Rights: | Attribution-NonCommercial-NoDerivatives 4.0 International https://creativecommons.org/licenses/by-nc-nd/4.0/ |
Appears in Collections: |
Text Analytics |
Please email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.
This item is licensed under a Creative Commons License