Archetype Discovery from Taxonomies: A Method to Cluster Small Datasets of Categorical Data

dc.contributor.authorLenssen, Lars
dc.contributor.authorStahmann, Philip
dc.contributor.authorJaniesch, Christian
dc.contributor.authorSchubert, Erich
dc.date.accessioned2024-12-26T21:05:24Z
dc.date.available2024-12-26T21:05:24Z
dc.date.issued2025-01-07
dc.description.abstractThis study investigates the challenges of clustering small categorical datasets, particularly in the context of taxonomy-based archetype formation. Taxonomies, such as the Linnaean system, are vital for organizing knowledge across diverse domains and can be used as code books. Archetypes then represent common patterns across the entities. While cluster analysis is a powerful tool for uncovering unknown patterns, traditional clustering methods are predominantly distance-based and optimized for continuous data, which is inadequate for categorical data where similarity is not easily quantifiable. Common distance measures, like Euclidean and Manhattan distances, fail to capture meaningful relationships in categorical datasets. This work addresses this gap by exploring information-theoretic approaches to develop a novel clustering method CatRED tailored for small categorical datasets such as taxonomy data. We evaluate our method through its application to two taxonomy datasets, demonstrating its effectiveness in generating archetypes.
dc.format.extent10
dc.identifier.doihttps://doi.org/10.24251/HICSS.2025.145
dc.identifier.isbn978-0-9981331-8-8
dc.identifier.other3f828c7c-f1fb-493e-b5d8-243c39d35548
dc.identifier.urihttps://hdl.handle.net/10125/108984
dc.relation.ispartofProceedings of the 58th Hawaii International Conference on System Sciences
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectData Science and Machine Learning to Support Business Decisions
dc.subjectarchetype, categorical data, cluster analysis, taxonomy
dc.titleArchetype Discovery from Taxonomies: A Method to Cluster Small Datasets of Categorical Data
dc.typeConference Paper
dc.type.dcmiText
prism.startingpage1223

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0121.pdf
Size:
640.97 KB
Format:
Adobe Portable Document Format