Text Analytics

Permanent URI for this collection


Recent Submissions

Now showing 1 - 5 of 7
  • Item
    An Investigation of Predictors of Information Diffusion in Social Media: Evidence from Sentiment Mining of Twitter Messages
    ( 2020-01-07) Salehan, Mohammad ; Kim, Dan
    Social media have facilitated information sharing in social networks. Previous research shows that sentiment of text influences its diffusion in social media. Each emotion can be located on a three-dimensional space formed by dimensions of valence (positive–negative), arousal (passive / calm–active / excited), and tension (tense–relaxed). While previous research has investigated the effect of emotional valence on information diffusion in social media, the effect of emotional arousal remains unexplored. This study examines how emotional arousal influences information diffusion in social media using a sentiment mining approach. We propose a research model and test it using data collected from Twitter.
  • Item
    Non-Exhaustive, Overlapping k-medoids for Document Clustering
    ( 2020-01-07) Kerstens, Eric
    Manual document categorization is time consuming, expensive, and difficult to manage for large collections. Unsupervised clustering algorithms perform well when documents belong to only one group. However, individual documents may be outliers or span multiple topics. This paper proposes a new clustering algorithm called non-exhaustive overlapping k-medoids inspired by k-medoids and non-exhaustive overlapping k-means. The proposed algorithm partitions a set of objects into k clusters based on pairwise similarity. Each object is assigned to zero, one, or many groups to emulate manual results. The algorithm uses dissimilarity instead of distance measures and applies to text and other abstract data. Neo-k-medoids is tested against manually tagged movie descriptions and Wikipedia comments. Initial results are primarily poor but show promise. Future research is described to improve the proposed algorithm and explore alternate evaluation measures.
  • Item
    Using Computational Text Mining to Understand Public Priorities for Disability Policy Towards Children in Canadian National Consultations
    ( 2020-01-07) Cogburn, Derrick ; Shikako-Thomas, Keiko ; Lai, Jonathan
    Identifying policy preferences from public consultations presents a challenge to national and local governments. Computational text mining approaches provide a useful strategy for analyzing the large-scale textual data emerging from these policy processes. In this study, we developed an inductive and deductive text mining approach to understand disability-related policy priorities. This approach is applied to data from the nationwide disability policy consultation conducted in 2016 by the Government of Canada. This process included 18 town hall meetings, 9 thematic roundtables, and online submissions from 92 stakeholders. Transcripts of these consultations were made available to researchers. Three broad research questions were asked of this data, focused on key themes; differences by city size and type of consultation; and impact of two global policy frameworks. The study identified a number of key themes and saw differences by city size. The study identified content related to both the CRPD and CRC.
  • Item
    Towards an Integrative Approach for Automated Literature Reviews Using Machine Learning
    ( 2020-01-07) Tauchert, Christoph ; Bender, Marco ; Mesbah, Neda ; Buxmann, Peter
    Due to a huge amount of scientific publications which are mostly stored as unstructured data, complexity and workload of the fundamental process of literature reviews increase constantly. Based on previous literature, we develop an artifact that partially automates the literature review process from collecting articles up to their evaluation. This artifact uses a custom crawler, the word2vec algorithm, LDA topic modeling, rapid automatic keyword extraction, and agglomerative hierarchical clustering to enable the automatic acquisition, processing, and clustering of relevant literature and subsequent graphical presentation of the results using illustrations such as dendrograms. Moreover, the artifact provides information on which topics each cluster addresses and which keywords they contain. We evaluate our artifact based on an exemplary set of 308 publications. Our findings indicate that the developed artifact delivers better results than known previous approaches and can be a helpful tool to support researchers in conducting literature reviews.
  • Item
    Supporting Interview Analysis with Autocoding
    ( 2020-01-07) Kaufmann, Andreas ; Barcomb, Ann ; Riehle, Dirk
    Interview analysis is a technique employed in qualitative research. Researchers annotate (code) interview transcriptions, often with the help of Computer-Assisted Qualitative Data Analysis Software (CAQDAS). The tools available today largely replicate the manual process of annotation. In this article, we demonstrate how to use natural language processing (NLP) to increase the reproducibility and traceability of the process of applying codes to text data. We integrated an existing commercial machine--learning (ML) based concept extraction service into an NLP pipeline independent of domain specific rules. We applied our prototype in three qualitative studies to evaluate its capabilities of supporting researchers by providing recommendations consistent with their initial work. Unlike rule based approaches, our process can be applied to interviews from any domain, without additional burden to the researcher for creating a new ruleset. Our work using three example data sets shows that this approach shows promise for a real--life application, but further research is needed.