Text Mining in Big Data Analytics

Permanent URI for this collection


Recent Submissions

Now showing 1 - 4 of 4
  • Item
    Towards Computational Assessment of Idea Novelty
    ( 2019-01-08) Wang, Kai ; Dong, Boxiang ; Ma, Junjie
    In crowdsourcing ideation websites, companies can easily collect large amount of ideas. Screening through such volume of ideas is very costly and challenging, necessitating automatic approaches. It would be particularly useful to automatically evaluate idea novelty since companies commonly seek novel ideas. Three computational approaches were tested, based on Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA) and term frequency–inverse document frequency (TF-IDF), respectively. These three approaches were used on three set of ideas and the computed idea novelty was compared with human expert evaluation. TF-IDF based measure correlated better with expert evaluation than the other two measures. However, our results show that these approaches do not match human judgement well enough to replace it.
  • Item
    Analyzing Trends and Topics in Internet Governance and Cybersecurity Debates Found in Twelve Years of IGF Transcripts
    ( 2019-01-08) Cogburn, Derrick
    Internet Governance research generates substantial and innovative, interdisciplinary global scholarship. What are key topics and themes in this research area, and how do they relate to cybersecurity? This paper answers these questions by analyzing transcripts from twelve years of the UN Internet Governance Forum (IGF), asking: (1) What key themes, topics, and entities are discussed at IGF? (2) Which issues have remained consistent at IGF, and which have changed? And (3) to what extent is the NIST Cybersecurity Framework represented at IGF? Using the CRISP-DM approach to text mining, we find human rights as the most dominant IGF theme, followed by freedom of expression, with disability being a persistent issue. During entity extraction cybersecurity emerges prominently, as does blockchain and IoT. Topic Modeling illustrates the resilience of human rights, but also identifies the IANA transition, accessibility, and “fake news.” Finally, the NIST cybersecurity framework is represented clearly in the data.
  • Item
    From Facebook to the Streets: Russian Troll Ads and Black Lives Matter Protests
    ( 2019-01-08) Etudo, Ugo ; Yoon, Victoria Y ; Yaraghi, Niam
    Online trolling is typically studied in the IS literature as an uncoordinated, anarchic activity. Coordinated, strategic online trolling is not well understood despite its prevalence on social media. To shed light on this prevailing activity, the present study examines the proposition that coordinated online trolling is timed to leverage macro societal unrest. In testing this proposition, we analyzes the dynamics of the Russian State’s coordinated trolling campaign against the United States beginning in 2015. Using the May 2018 release of all Russian Troll Facebook advertisements, this study constructs a topic model of the content of these ads. The relationship between ad topics and the frequency of Black Lives Matter protests is examined. We argue that the frequency of Black Lives Matter protests proxies for civil unrest and divisiveness in the United States. The study finds that Russian ads related to police brutality were issued to coincide with periods of higher unrest. This work also finds that during periods of relative calm (evidenced by lower frequency of protests) Russian ads were relatively innocuous.
  • Item
    Introduction to the Minitrack on Text Mining in Big Data Analytics
    ( 2019-01-08) Cogburn, Derrick ; Hine, Michael ; Peladeau, Normand ; Yoon, Victoria Y