Streaming Data Analytics and Applications Minitrack

Permanent URI for this collection

https://hdl.handle.net/10125/42432

In stream data analytics data are analyzed online as they arrive and decisions are made in real- time. This minitrack aims to present and share new research in defining and highlighting the values of stream data analytics, including new theory, algorithms, innovation in methodologies, and benefits from a variety of applications.

Topics include, but are not constrained only to:

Data stream mining techniques and methodologies
Distributed data stream models
Concept drift in streaming data
Data streams visualization
Real-world applications using streaming data analytics in:

Social networks
Intelligence and cybersecurity
Smart power grid
Sensor networks
Internet of Things

Minitrack Co-Chairs:

Mehmed Kantardzic (Primary Contact)
University of Louisville
Email: mmkant01@exchange.louisville.edu

Jozef Zurada
University of Louisville
Email: jmzura01@louisville.edu

Browse

Now showing 1 - 3 of 3

The RADStack: Open Source Lambda Architecture for Interactive Analytics
(2017-01-04) Yang, Fangjin; Merlino, Gian; Ray, Nelson; Léauté, Xavier; Gupta, Himanshu; Tschetter, Eric
The Real-time Analytics Data Stack, colloquially referred to \ as the RADStack, is an open-source data analytics stack designed \ to provide fast, flexible queries over up-to-the-second \ data. It is designed to overcome the limitations of either \ a purely batch processing system (it takes too long to surface \ new events) or a purely real-time system (it’s difficult \ to ensure that no data is left behind and there is often no \ way to correct data after initial processing). It will seamlessly \ return best-effort results on very recent data combined \ with guaranteed-correct results on older data. In this paper, \ we introduce the architecture of the RADStack and discuss \ our methods of providing interactive analytics and a flexible \ data processing environment to handle a variety of real-world \ workloads.
Sliding Reservoir Approach for Delayed Labeling in Streaming Data Classification
(2017-01-04) Hu, Hanqing; Kantardzic, Mehmed
When concept drift occurs within streaming data, a streaming data classification framework needs to update the learning model to maintain its performance. Labeled samples required for training a new model are often unavailable immediately in real world applications. This delay of labels might negatively impact the performance of traditional streaming data classification frameworks. To solve this problem, we propose Sliding Reservoir Approach for Delayed Labeling (SRADL). By combining chunk based semi-supervised learning with a novel approach to manage labeled data, SRADL does not need to wait for the labeling process to finish before updating the learning model. Experiments with two delayed-label scenarios show that SRADL improves prediction performance over the naïve approach by as much as 7.5% in certain cases. The most gain comes from 18-chunk labeling delay time with continuous labeling delivery scenario in real world data experiments.
Introduction to Streaming Data Analytics and Applications Minitrack
(2017-01-04) Kantardzic, Mehmed; Zurada, Jozef

Browse

Recent Submissions