Please use this identifier to cite or link to this item:

Non-Exhaustive, Overlapping k-medoids for Document Clustering

File Size Format  
0078.pdf 316.7 kB Adobe PDF View/Open

Item Summary

Title:Non-Exhaustive, Overlapping k-medoids for Document Clustering
Authors:Kerstens, Eric
Keywords:Text Analytics
document clustering
outlier detection
Date Issued:07 Jan 2020
Abstract:Manual document categorization is time consuming, expensive, and difficult to manage for large collections. Unsupervised clustering algorithms perform well when documents belong to only one group. However, individual documents may be outliers or span multiple topics. This paper proposes a new clustering algorithm called non-exhaustive overlapping k-medoids inspired by k-medoids and non-exhaustive overlapping k-means. The proposed algorithm partitions a set of objects into k clusters based on pairwise similarity. Each object is assigned to zero, one, or many groups to emulate manual results. The algorithm uses dissimilarity instead of distance measures and applies to text and other abstract data. Neo-k-medoids is tested against manually tagged movie descriptions and Wikipedia comments. Initial results are primarily poor but show promise. Future research is described to improve the proposed algorithm and explore alternate evaluation measures.
Pages/Duration:10 pages
Rights:Attribution-NonCommercial-NoDerivatives 4.0 International
Appears in Collections: Text Analytics

Please email if you need this content in ADA-compliant format.

This item is licensed under a Creative Commons License Creative Commons