On the Feasibility of Vision-Language Models for Time-Series Classification

Loading...
Thumbnail Image

Contributor

Advisor

Editor

Performer

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Interviewee

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Journal Name

Volume

Number/Issue

Starting Page

1065

Ending Page

Alternative Title

Abstract

We explore the feasibility of applying Vision–Language Models (VLMs) to time-series classification (TSC) by fine-tuning them with minimal supervision. We develop a novel approach that incorporates graphical data representations as images in conjunction with numerical data. This approach is rooted in the hypothesis that graphical representations can provide additional contextual information that numerical data alone may not capture. Additionally, providing a graphical representation can circumvent issues such as limited context length faced by LLMs. To study this systematically, we implement a scalable end-to-end pipeline that supports multiple scenarios, varying context length, downsampling strategy, and prompt design. Using this pipeline, we fine-tune VLMs for only one to two epochs on univariate and multivariate datasets, and analyze how design choices affect accuracy. Our findings position VLMs as a feasible but currently limited baseline for TSC, and point towards design considerations for future work at the intersection of multimodal learning and temporal data.

Description

Citation

DOI

Extent

10 pages

Format

Type

Conference Paper

Geographic Location

Time Period

Related To

Proceedings of the 59th Hawaii International Conference on System Sciences

Related To (URI)

Table of Contents

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Rights Holder

Catalog Record

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.