Big Data and Analytics: Pathways to Maturity
Permanent URI for this collection
Browse
Recent Submissions
Item Smart Objects: An Active Big Data Approach(2018-01-03) Kaisler, Stephen; Money, William; Cohen, StephenThe world of data and information has been steadily evolving due to changes in the expansion of complexity and of the data processed by our systems. Big Data has evolved from data that are numbers and characters conceived and collected by individuals, to unstructured data types collected by a variety of devices. Recent work has postulated that the Big Data evolutionary process is making a conceptual leap to incorporate intelligence.. This paper proposes that Big Data have not yet made a complete evolutionary leap, but rather that a new class of data - a higher level of abstraction is needed to understand and integrate this "intelligence" concept. This paper examines previous definitions, and offers a new definition for Smart Objects (SO) that extends this evolutionary path, examines the basic concept of smart data (is it really exhibiting properties associated with or purported to be intelligence?), and identifies issues and challenges associated with understanding Smart Objects as a new software paradigm. It concludes that Smart Objects incorporate new features and have different properties from passive and inert Big Data.Item Counting Human Flow with Deep Neural Network(2018-01-03) Doong, ShingHuman flow counting has many applications in space management. This study applied channel state information (CSI) available in IEEE 802.11n networks to characterize the flow count. Raw inputs including mean, standard deviation and five-number summary were extracted from windowed CSI data. Due to the large number of raw inputs, stacked denoising autoencoders were used to extract hierarchical features from raw inputs and a final layer of softmax regression was used to model the flow counting problem. It is found that this deep neural network structure beats other popular classification algorithms including random forest, logistic regression, support vector machine and multilayer perceptron in predicting the flow count with attractive speed performance.Item Feature enrichment through multi-gram models(2018-01-03) Forss, ThomasWe introduce a feature enrichment approach, by developing multi-gram cosine similarity classification models. Our approach combines cosine similarity features of different N-gram word models, and unsupervised sentiment features, into models with a richer feature set than any of the approaches alone can provide. We test the classification models using different machine learning algorithms on categories of hateful and violent web content, and show that our multi-gram models give across-the-board performance improvements, for all categories tested, compared to combinations of baseline unigram, N-gram, and sentiment classification models. Our multi-gram models perform significantly better on highly imbalanced sets than the comparison methods, while this enrichment approach leaves room for further improvements, by adding instead of exhausting optimization options.Item An Efficient Recommender System Using Locality Sensitive Hashing(2018-01-03) Zhang, Kunpeng; Fan, Shaokun; Wang, Harry JiannanRecommender systems are widely used for personalized recommendation in many business applications such as online shopping websites and social network platforms. However, with the tremendous growth of recommendation space (e.g., number of users, products, etc.), traditional systems suffer from time and space complexity issues and cannot make real-time recommendations when dealing with large-scale data. In this paper, we propose an efficient recommender system by incorporating the locality sensitive hashing (LSH) strategy. We show that LSH can approximately preserve similarities of data while significantly reducing data dimensions. We conduct experiments on synthetic and real-world datasets of various sizes and data types. The experiment results show that the proposed LSH-based system generally outperforms traditional item-based collaborative filtering in most cases in terms of statistical accuracy, decision support accuracy, and efficiency. This paper contributes to the fields of recommender systems and big data analytics by proposing a novel recommendation approach that can handle large-scale data efficiently.Item Leveraging Big Data Analytics to Improve Quality of Care In Health Care: A fsQCA Approach(2018-01-03) Wang, YichuanAcademics across disciplines such as information systems, computer science and healthcare informatics highlight that big data analytics (BDA) have the potential to provide tremendous benefits for healthcare industries. Nevertheless, healthcare organizations continue to struggle to make progress on their BDA initiatives. Drawing on the configuration theory, this paper proposes a conceptual framework to explore the impact of BDA on improving quality of care in health care. Specifically, we investigate how BDA capabilities interact with complementary organizational resources and organizational capabilities in multiple configurations to achieve higher quality of care. Fuzzy-set qualitative comparative analysis (fsQCA), which is a relatively new approach, was employed to identify five different configurations that lead to higher quality of care. These findings offer evidence to suggest that a range of solutions leading to better healthcare performance can indeed be identified through the effective use of BDA and other organizational elements.Item Data Quality Challenges in Twitter Content Analysis for Informing Policy Making in Health Care(2018-01-03) Soto, Axel; Ryan, Cynthia; Peña Silva, Fernando; Das, Tapajyoti; Wolkowicz, Jacek; Milios, Evangelos; Brooks, StephenSocial media platforms and microblogs have become popular fora where the general public expresses opinions and concerns on a variety of matters. As a result, private and public organizations have been looking into ways for finding, understanding and communicating insights extracted from this massive amount of text-based interconnected data. There are, however, important difficulties associated with the noisiness and reliability of the content that hinder the analysis of the data. This paper reports the main challenges found in a real-world experience with social media used as a source of data to support policy making and assessment. We also propose a set of strategies for the precise retrieval of data, the profiling of social media users, and the involvement of policy makers in the analytical process.Item Business Intelligence & Analytics Cost Accounting: A Survey on the Perceptions of Stakeholders(2018-01-03) Grytz, Raphael; Krohn-Grimberghe, ArtusAs data driven decision-making using business intelligence and analytics (BI&A) becomes standard in companies, the importance of mitigating the accompanying growth in costs increases. Research shows that increasing transparency to the granularity of individual BI&A artefacts such as reports or analytic applications is a necessary means, but in practice the introduction of said systems is cumbersome and adoption is slow. We address the status quo of BI&A cost accounting for three types of stakeholders: users, developers and managers. The results show in which areas of application a strong need for action exists and we identify major challenges for further research are ahead. Our findings indicate for example that managers at the same time regard cost accounting for BI&A with a higher potential benefit while they also believe they have already established a higher degree of implementation in their enterprises compared to the other stakeholder types. We conclude that BI&A professionals have to consider these different perceptions to run a successful department and gain traction for BI&A cost accounting.Item Introduction to the Minitrack on Big Data and Analytics: Pathways to Maturity(2018-01-03) Kaisler, Stephen; Armour, Frank; Espinosa, Alberto