Text Mining in Big Data Analytics Minitrack
Permanent URI for this collection
Global collaborations, social media, and information systems of all types, generate enormous amounts of textual data (e.g. email archives, websites, blog posts, meeting transcripts, speeches, annual reports, published material, and social media posts). While this unstructured textual data is readily available, it presents a tremendous challenge to researchers trying to analyze these large bodies of text with traditional social science methods. Text mining in big data analytics is becoming increasingly important to an interdisciplinary group of scholars, practitioners, government officials, and international organizations. For example, the American Association for the Advancement of Science (AAAS) launched a new competition in 2014 on Big Data and Analytics within its highly competitive senior executive branch fellowship program. Other corporate initiatives like the Big Boulder Initiative was formed in 2014 as a trade association wholly dedicated to "the advancement of social data in businesses and organizations of all kinds" (http://www.bbi.org/).
However, within this growing focus on Big Data, there is a dilemma. While as much as 75-80% of available data is unstructured text, many people are not trained in the techniques for analyzing large bodies of text. This minitrack is designed to contribute to the growing big data focus at HICSS, and invites papers that apply text-mining approaches to a wide variety of substantive domains, including, but not limited to theoretical and applied approaches to analyzing:
- Blog posts
- Twitter and social media analysis
- Email archives
- Published articles
- Websites and blogs
- Meeting transcripts
And addressing methodological challenges, such as:
- Automated acquisition and cleaning data
- Working on distributed, high-performance computers
- Overcoming API limitations
- Using LDA, LSA, and other techniques
Derrick L. Cogburn (Primary Contact)
Michael J. Hine
ItemModeling Twitter Engagement in Real-World Events( 2017-01-04)Twitter offers tremendous opportunities for people to engage with real-world events (e.g., political election) through information sharing and communicating about these events. However, little is understood about the factors that affect people’s Twitter engagement (e.g., posting) in such real-world events. This paper examines multiple predictive factors associated with four different perspectives of users’ Twitter engagement, and quantify their potential influence on predicting the (i) presence; and (ii) degree of the user’s engagement with real-world events. We find that the measures of people’s prior Twitter activities, topical interests, geolocation, and social network structures are all variously correlated to their engagement with real-world events. \
ItemBuilding an Environmental Sustainability Dictionary for the IT Industry( 2017-01-04)Content analysis is a commonly utilized methodology in corporate sustainability research. However, because most corporate sustainability research using content analysis is based on human coding, the research capability and the scope of the research design has limitations. The relatively recent text mining technique addresses some of the limitations of manual content analysis but its usage is often dependent upon the development of a domain specific dictionary. This paper develops an environmental sustainability dictionary in the context of corporate sustainability reports for the IT industry. In support of building said dictionary, we develop a standardized dictionary building process model that can be applied across many domains.
ItemAn Ontological Approach to Misinformation: Quickly Finding Relevant Information( 2017-01-04)Identifying misinformation (i.e. rumors) is a growing field of research in the information systems field. This is due to the fact that during recent tragedies (i.e. Boston Bombings, Ebola, etcetera), rumors spread rapidly on social media platforms, which will hide the facts about an event. This results in rumors being spread even more, further hiding the events. In this study, we draw from research from the semantic web to tackle this problem. We propose the use of ontologies and related concepts can help find accurate information for a case quickly and accurately. Combined with a weighting formula, we will be able to display the most relevant results to an interested party. In this research in progress, we outline our plan on how to accomplish this once an ontology and dataset is found.