Benchmarking Cluster-Then-Predict Models to Challenge Prevailing Global Machine Learning Models
Loading...
Files
Date
Contributor
Advisor
Editor
Performer
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Interviewee
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Journal Name
Volume
Number/Issue
Starting Page
861
Ending Page
Alternative Title
Abstract
In predictive analytics domains, such as healthcare, marketing and finance, data exhibits inherent segmentation, like patient, customer and market segments. Powerful global models, like XGBoost or Catboost, offer high predictive qualities, yet ignore modeling clusters explicitly and are limited by low interpretability. Cluster-then-predict (CTP) models have been proposed to offer more actionable insights. These hybrid models first segment data and then train cluster-specific linear models, combining the capacity to model complex relationships with model transparency. Previous CTP approaches rely on decision trees for segmentation, neglecting alternative methods. This study proposes six CTP models and benchmarks them against five global models. Our results show that k-means CTP ranks fourth out of eleven models in 20 benchmark datasets. While CTP models with DTs rank fifth best, they are substantially simpler to interpret. Consequently, we establish a variety of cluster-then-predict models and call for their consideration when faced with heterogeneous datasets.
Description
Citation
Extent
10
Format
Type
Conference Paper
Geographic Location
Time Period
Related To
Proceedings of the 58th Hawaii International Conference on System Sciences
Related To (URI)
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Catalog Record
Local Contexts
Collections
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.
