Text Mining in Big Data Analytics

Permanent URI for this collection


Recent Submissions

Now showing 1 - 5 of 5
  • Item
    Comparison of Latent Dirichlet Modeling and Factor Analysis for Topic Extraction: A Lesson of History
    ( 2018-01-03) Peladeau, Normand ; Davoodi, Elnaz
    Topic modeling is often perceived as a relatively new development in information retrieval sciences, and new methods such as Probabilistic Latent Semantic Analysis and Latent Dirichlet Allocation have generated a lot of research. However, attempts to extract topics from unstructured text using Factor Analysis techniques can be found as early as the 1960s. This paper compares the perceived coherence of topics extracted on three different datasets using Factor Analysis and Latent Dirichlet Allocation. To perform such a comparison a new extrinsic evaluation method is proposed. Results suggest that Factor Analysis can produce topics perceived by human coders as more coherent than Latent Dirichlet Allocation and warrant a revisit of a topic extraction method developed more than fifty-five years ago, yet forgotten.
  • Item
    Text Mining Narrative Survey Responses to Develop Engagement Scale Items
    ( 2018-01-03) Ford, John ; Nierle, Doug ; Leeds, Peter ; Stetz, Thomas
    A sixteen-item employee engagement scale was supplemented with items developed from literature review, from related scales, and from text mining narrative responses to an open-ended question about improving employee performance. The text mining procedure is described and may be useful to other scale developers. Some items derived from text mining performed as well as those developed using traditional methods. Possible modifications and extensions of the method are suggested.
  • Item
    Enhancing Scientific Collaboration Through Knowledge Base Population and Linking for Meetings
    ( 2018-01-03) Gao, Ning ; Dredze, Mark ; Oard, Douglas
    Recent research on scientific collaboration shows that distributed interdisciplinary collaborations report comparatively poor outcomes, and the inefficiency of the coordination mechanisms is partially responsible for the problems. To improve in-formation sharing between past collaborators and future team members, or reuse of collaboration records from one project by future researchers, this pa-per describes systems that automatically construct a knowledge base of the meetings from the calendars of participants, and that then link reference to those meetings found in email messages to the correspond-ing meeting in the knowledge base. This is work in progress in which experiments with a publicly avail-able corporate email collection with calendar entries show that the knowledge base population function achieves high precision (0.98, meaning that almost all knowledge base entities are actually meetings) and that the accuracy of the linking from email messages to knowledge base entries (0.90) is already quite good.
  • Item
    On the Patent Claim Eligibility Prediction Using Text Mining Techniques
    ( 2018-01-03) Lai, Chia-Yu ; Hwang, San-Yih ; Wei, Chih-Ping
    With the widespread of computer software in recent decades, software patent has become controversial for the patent system. Of the many patentability requirements, patentable subject matter serves as a gatekeeping function to prevent a patent from preempting future innovation. Software patents may easily fall into the gray area of abstract ideas, whose allowance may hinder future innovation. However, without a clear definition of abstract ideas, determining the patent claim subject matter eligibility is a challenging task for examiners and applicants. In this research, in order to solve the software patent eligibility issues, we propose an effective model to determine patent claim eligibility by text-mining and machine learning techniques. Drawing upon USPTO issued guidelines, we identify 66 patent cases to design domain knowledge features, including abstractness features and distinguishable word features, as well as other textual features, to develop the claim eligibility prediction model. The experiment results show our proposed model reaches the accuracy of more than 80%, and domain knowledge features play a crucial role in our prediction model.
  • Item
    Introduction to the Minitrack on Text Mining in Big Data Analytics
    ( 2018-01-03) Cogburn, Derrick L. ; Hine, Mike ; Peladeau, Normand ; Yoon, Victoria