On the Feasibility of Vision-Language Models for Time-Series Classification
Loading...
Files
Date
Contributor
Advisor
Editor
Performer
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Interviewee
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Journal Name
Volume
Number/Issue
Starting Page
1065
Ending Page
Alternative Title
Abstract
We explore the feasibility of applying Vision–Language Models (VLMs) to time-series classification (TSC) by fine-tuning them with minimal supervision. We develop a novel approach that incorporates graphical data representations as images in conjunction with numerical data. This approach is rooted in the hypothesis that graphical representations can provide additional contextual information that numerical data alone may not capture. Additionally, providing a graphical representation can circumvent issues such as limited context length faced by LLMs. To study this systematically, we implement a scalable end-to-end pipeline that supports multiple scenarios, varying context length, downsampling strategy, and prompt design. Using this pipeline, we fine-tune VLMs for only one to two epochs on univariate and multivariate datasets, and analyze how design choices affect accuracy. Our findings position VLMs as a feasible but currently limited baseline for TSC, and point towards design considerations for future work at the intersection of multimodal learning and temporal data.
Description
Keywords
Citation
DOI
Extent
10 pages
Format
Type
Conference Paper
Geographic Location
Time Period
Related To
Proceedings of the 59th Hawaii International Conference on System Sciences
Related To (URI)
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Catalog Record
Local Contexts
Collections
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.
