Mining and Representing Unstructured Nicotine Use Data in a Structured Format for Secondary Use

Date
2019-01-08
Authors
Ngwenya, Mandlenkosi
Bankole, Felix
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
The objective of this study was to use rules, NLP and machine learning for addressing the problem of clinical data interoperability across healthcare providers. Addressing this problem has the potential to make clinical data comparable, retrievable and exchangeable between healthcare providers. Our focus was in giving structure to unstructured patient smoking information. We collected our data from the MIMIC-III database. We wrote rules for annotating the data, then trained a CRF sequence classifier. We obtained an f-measure of 86%, 72%, 69%, 80%, and 12% for substance smoked, frequency, amount, temporal, and duration respectively. Amount smoked yielded a small value due to scarcity of related data. Then for smoking status we obtained an f-measure of 94.8% for non-smoker class, 83.0% for current-smoker, and 65.7% for past-smoker. We created a FHIR profile for mapping the extracted data based on openEHR reference models, however in future we will explore mapping to CIMI models.
Description
Keywords
Big Data on Healthcare Application, Information Technology in Healthcare, Big data, Conditional Random Fields, FHIR profiles, Natural language processing, Unstructured health data
Citation
Extent
10 pages
Format
Geographic Location
Time Period
Related To
Proceedings of the 52nd Hawaii International Conference on System Sciences
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.