The RADStack: Open Source Lambda Architecture for Interactive Analytics

Date
2017-01-04
Authors
Yang, Fangjin
Merlino, Gian
Ray, Nelson
Léauté, Xavier
Gupta, Himanshu
Tschetter, Eric
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The Real-time Analytics Data Stack, colloquially referred to \ as the RADStack, is an open-source data analytics stack designed \ to provide fast, flexible queries over up-to-the-second \ data. It is designed to overcome the limitations of either \ a purely batch processing system (it takes too long to surface \ new events) or a purely real-time system (it’s difficult \ to ensure that no data is left behind and there is often no \ way to correct data after initial processing). It will seamlessly \ return best-effort results on very recent data combined \ with guaranteed-correct results on older data. In this paper, \ we introduce the architecture of the RADStack and discuss \ our methods of providing interactive analytics and a flexible \ data processing environment to handle a variety of real-world \ workloads.
Description
Keywords
Druid, Samza, Kafka, Streaming, Analytics
Citation
Rights
Access Rights
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.