Big Data and Analytics: Concepts, Methods, Techniques and Applications Minitrack

Permanent URI for this collection

This minitrack focuses on the use of big data and analytics to enable businesses and organizations to optimize their operational practices, improve their decision-making, and better understand and provide services to their customers and clients. It seeks papers in all technical areas of big data and analytics, including: technology and infrastructure, storage, management, usage case studies, innovative applications, innovative uses of tools to solve complex problems using big data, metrics for assessing big data value, and enabling technology. It also seeks papers and case studies in relevant organizational and management areas associated with effective big data and analytics practices, including: strategy, governance, security, human resources, work coordination, business process and business impact, among others. Relevant papers on the development of strategy for deploying big data and analytics in distributed organizations – including geographic and virtual entities, the effects of big data and analytics on organizational behavior, and the development of big data analytics are sought. Additionally, papers are sought on developing an analytic cadre, including curriculum concepts, in-house training, and skills development and measurement.

We solicit paper submissions that: advance our knowledge of Big Data storage and structure; help us learn about effective processes and approaches to effectively manage Big Data and the associated business analytics; begin to identify ways to measure the organizational benefits derived from using and analyzing Big Data; present case studies of Big Data implementation and use; address the design and development of analytical frameworks that incorporate multiple analytic methods and techniques based on different architectures and technologies; and address the organizational and business aspects of big data and analytics.

Papers will be solicited in several areas, including, but not limited to the following:

  • Challenges in managing big data repositories and projects
  • Graph analytics - both syntactic and semantic - that play a big role in the exploitation of social media data
  • Advanced analytics, - emphasizing visual analytics and non-numeric analysis models and their implementation as applied to complex problems in different domains
  • Scalable semantic annotation and reasoning across big data stores
  • Metrics for assessing the impact of big data in business, scientific, and governmental decision-making
  • Organizational and business aspects of big data, analytics and data science
  • Crowdsourcing as a distributed, complex analytic tool

Minitrack Co-Chairs:

Stephen Kaisler (Primary Contact)
SHK & Associates
Email: skaisler1@comcast.net

Frank Armour
American University
Email: fjarmour@gmail.com

Alberto Espinosa
American University
Email: alberto@american.edu

Browse

Recent Submissions

Now showing 1 - 8 of 8
  • Item
    Value Oriented Big Data Strategy: Analysis & Case Study
    (2017-01-04) Arcondara, Jonathan; Himmi, Khaled; Guan, Peiqing; Zhou, Wei
    Big data is emerging in recent years as an evolutionary phenomenon. Many new concepts and business models driven by data are introduced as a result. We in this research are motivated to investigate the value side of big data. We examine the financial statements in CAC40 companies in order to discover the relationship between stock performance and depth of corporate data involvement. Our results are surprisingly two-folded. There are companies with strong data capability that succeed in the stock market. There are also companies without much data depth that also perform well. Moreover, the result doesn’t show link between the not-well performed companies and the lack of data capability. To decode this surprising result, we reexamine the existing strategic big data literature and discover the missing puzzle pieces. We thus propose a new strategic model that considers both supply chain decision dynamics and data capability. We explain this model based on an airline industry’s case study. We draw managerial implications to conclude this paper.
  • Item
    Service-oriented Cost Allocation for Business Intelligence and Analytics: Who pays for BI&A?
    (2017-01-04) Grytz, Raphael; Krohn-Grimberghe, Artus
    Quantifying and designing the cost pool generated by Business Intelligence and Analytics (BI&A) would improve cost transparency and invoicing processes, allowing a fairer, more exact allocation of costs to service consumers. Yet there is still no method for determining BI&A costs to provide a base for allocation purposes. While literature describes several methods for BI&A cost estimation on an ROI or resource-consumption level, none of these methods considers an overall approach for BI&A. To tackle this problem, we propose a service-oriented cost allocation model which calculates BI&A applications based on defined services, enabling a cost transfer to service consumers. This new approach specifies steps towards deriving a usable pricing scheme for an entire BI&A service portfolio – both for allocation purposes as well as improving cost evaluation of BI&A projects. Moreover, it prevents BI&A departments from being considered as the sole cost driver, increasing customer understanding and cost awareness.
  • Item
    Introducing Data Science to Undergraduates through Big Data: Answering Questions by Wrangling and Profiling a Yelp Dataset
    (2017-01-04) Jensen, Scott
    There is an insatiable demand in industry for data scientists, and graduate programs and certificates are gearing up to meet this demand. However, there is agreement in the industry that 80% of a data scientist’s work consists of the transformation and profiling aspects of wrangling Big Data; work that may not require an advanced degree. In this paper we present hands-on exercises to introduce Big Data to undergraduate MIS students using the CoNVO Framework and Big Data tools to scope a data problem and then wrangle the data to answer questions using a real world dataset. This can provide undergraduates with a single course introduction to an important aspect of data science.
  • Item
    Data Systems Fault Coping for Real-time Big Data Analytics Required Architectural Crucibles
    (2017-01-04) Cohen, Stephen; Money, William H.
    This paper analyzes the properties and characteristics of unknown and unexpected faults introduced into information systems while processing Big Data in real-time. The authors hypothesize that there are new faults, and requirements for fault handling and propose an analytic model and architectural framework to assess and manage the faults and mitigate the risks of correlating or integrating otherwise uncorrelated Big Data, and to ensure the source pedigree, quality, set integrity, freshness, and validity of data being consumed. We argue that new architectures, methods, and tools for handling and analyzing Big Data systems functioning in real-time must design systems that address and mitigate concerns for faults resulting from real-time streaming processes while ensuring that variables such as synchronization, redundancy, and latency are addressed. This paper concludes that with improved designs, real-time Big Data systems may continuously deliver the value and benefits of streaming Big Data.
  • Item
    Comparing Data Science Project Management Methodologies via a Controlled Experiment
    (2017-01-04) Saltz, Jeffrey; shamshurin, Ivan; Crowston, Kevin
    Data Science is an emerging field with a significant research focus on improving the techniques available to analyze data. However, there has been much less focus on how people should work together on a data science project. In this paper, we report on the results of an experiment comparing four different methodologies to manage and coordinate a data science project. We first introduce a model to compare different project management methodologies and then report on the results of our experiment. The results from our experiment demonstrate that there are significant differences based on the methodology used, with an Agile Kanban methodology being the most effective and surprisingly, an Agile Scrum methodology being the least effective.
  • Item
    An Introduction to the MISD Technology
    (2017-01-04) Popov, Aleksey
    The growth of data volume, velocity, and variety will be the global IT challenges in the next decade. To overcome performance limits, the most effective innovations such as cognitive computing, GPU, FPGA acceleration, and heterogeneous computing have to be integrated with the traditional microprocessor technology. As the fundamental part of most computational challenges, the discrete mathematics should be supported both by the computer hardware and software. But for now, the optimization methods on graphs and big data sets are generally based on software technology, while hardware support is promising to give a better result. \ \ In this paper, the new computing technology with direct hardware support of discrete mathematic functions is presented. The new non-Von Neumann microprocessor named Structure Processing Unit (SPU) to perform operations over large data sets, data structures, and graphs was developed and verified in Bauman Moscow State Technical University. The basic principles of SPU implementation in the computer system with multiple instruction and single data stream (MISD) are presented. We then introduce the programming techniques for such a system with CPU and SPU included. The experimental results and performance tests for the universal MISD computer are shown.
  • Item
    A Correlation Network Model for Structural Health Monitoring and Analyzing Safety Issues in Civil Infrastructures
    (2017-01-04) Fuchsberger, Alexander; Ali, Hesham
    Structural Health monitoring (SHM) is essential to analyze safety issues in civil infrastructures and bridges. With the recent advancements in sensor technology, SHM is moving from the occasional or periodic maintenance checks to continuous monitoring. While each technique, whether it is utilizing assessment or sensors, has their advantages and disadvantages, we propose a method to predict infrastructure health based on representing data streams from multiple sources into a graph model that is more scaleable, flexible and efficient than relational or unstructured databases. The proposed approach is centered on the idea of intelligently determining similarities among various structures based on population analysis that can then be visualized and carefully studied. If some “unhealthy” structures are identified through assessments or sensor readings, the model is capable of finding additional structures with similar parameters that need to be carefully inspected. This can save time, cost and effort in inspection cycles, provide increased readiness, help to prioritize inspections, and in general lead to safer, more reliable infrastructures.