Benchmarking Cluster-Then-Predict Models to Challenge Prevailing Global Machine Learning Models

Loading...
Thumbnail Image

Contributor

Advisor

Editor

Performer

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Interviewee

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Journal Name

Volume

Number/Issue

Starting Page

861

Ending Page

Alternative Title

Abstract

In predictive analytics domains, such as healthcare, marketing and finance, data exhibits inherent segmentation, like patient, customer and market segments. Powerful global models, like XGBoost or Catboost, offer high predictive qualities, yet ignore modeling clusters explicitly and are limited by low interpretability. Cluster-then-predict (CTP) models have been proposed to offer more actionable insights. These hybrid models first segment data and then train cluster-specific linear models, combining the capacity to model complex relationships with model transparency. Previous CTP approaches rely on decision trees for segmentation, neglecting alternative methods. This study proposes six CTP models and benchmarks them against five global models. Our results show that k-means CTP ranks fourth out of eleven models in 20 benchmark datasets. While CTP models with DTs rank fifth best, they are substantially simpler to interpret. Consequently, we establish a variety of cluster-then-predict models and call for their consideration when faced with heterogeneous datasets.

Description

Citation

Extent

10

Format

Type

Conference Paper

Geographic Location

Time Period

Related To

Proceedings of the 58th Hawaii International Conference on System Sciences

Related To (URI)

Table of Contents

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Rights Holder

Catalog Record

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.