Mining and Representing Unstructured Nicotine Use Data in a Structured Format for Secondary Use

Ngwenya, Mandlenkosi
Bankole, Felix
Journal Title
Journal ISSN
Volume Title
The objective of this study was to use rules, NLP and machine learning for addressing the problem of clinical data interoperability across healthcare providers. Addressing this problem has the potential to make clinical data comparable, retrievable and exchangeable between healthcare providers. Our focus was in giving structure to unstructured patient smoking information. We collected our data from the MIMIC-III database. We wrote rules for annotating the data, then trained a CRF sequence classifier. We obtained an f-measure of 86%, 72%, 69%, 80%, and 12% for substance smoked, frequency, amount, temporal, and duration respectively. Amount smoked yielded a small value due to scarcity of related data. Then for smoking status we obtained an f-measure of 94.8% for non-smoker class, 83.0% for current-smoker, and 65.7% for past-smoker. We created a FHIR profile for mapping the extracted data based on openEHR reference models, however in future we will explore mapping to CIMI models.
Big Data on Healthcare Application, Information Technology in Healthcare, Big data, Conditional Random Fields, FHIR profiles, Natural language processing, Unstructured health data
Access Rights
Email if you need this content in ADA-compliant format.