Archetype Discovery from Taxonomies: A Method to Cluster Small Datasets of Categorical Data

Loading...
Thumbnail Image

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Interviewee

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Volume

Number/Issue

Starting Page

1223

Ending Page

Alternative Title

Abstract

This study investigates the challenges of clustering small categorical datasets, particularly in the context of taxonomy-based archetype formation. Taxonomies, such as the Linnaean system, are vital for organizing knowledge across diverse domains and can be used as code books. Archetypes then represent common patterns across the entities. While cluster analysis is a powerful tool for uncovering unknown patterns, traditional clustering methods are predominantly distance-based and optimized for continuous data, which is inadequate for categorical data where similarity is not easily quantifiable. Common distance measures, like Euclidean and Manhattan distances, fail to capture meaningful relationships in categorical datasets. This work addresses this gap by exploring information-theoretic approaches to develop a novel clustering method CatRED tailored for small categorical datasets such as taxonomy data. We evaluate our method through its application to two taxonomy datasets, demonstrating its effectiveness in generating archetypes.

Description

Citation

Extent

10

Format

Geographic Location

Time Period

Related To

Proceedings of the 58th Hawaii International Conference on System Sciences

Related To (URI)

Table of Contents

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Rights Holder

Catalog Record

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.