Automated topic analysis for restricted scope health corpora: methodology and comparison with human performance

dc.contributor.author Maeder, Anthony
dc.contributor.author Tieman, Jennifer
dc.contributor.author Naveda, Bertha
dc.contributor.author Champion, Stephanie
dc.contributor.author Agnew, Tamara
dc.date.accessioned 2020-12-24T19:08:00Z
dc.date.available 2020-12-24T19:08:00Z
dc.date.issued 2021-01-05
dc.description.abstract This paper addresses the problem of identifying topics which describe information content, in restricted size sets of scientific papers extracted from publication databases. Conventional computational approaches, based on natural language processing using unsupervised classification algorithms, typically require large numbers of papers to achieve adequate training. The approach presented here uses a simpler word-frequency-based approach coupled with context modeling. An example is provided of its application to corpora resulting from a curated literature search site for COVID-19 research publications. The results are compared with a conventional human-based approach, indicating partial overlap in the topics identified. The findings suggest that computational approaches may provide an alternative to human expert topic analysis, provided adequate contextual models are available.
dc.format.extent 7 pages
dc.identifier.doi 10.24251/HICSS.2021.095
dc.identifier.isbn 978-0-9981331-4-0
dc.identifier.uri http://hdl.handle.net/10125/70706
dc.language.iso English
dc.relation.ispartof Proceedings of the 54th Hawaii International Conference on System Sciences
dc.rights Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject Text Analytics
dc.subject topic analysis
dc.subject natural language processing
dc.subject keyword extraction
dc.subject term frequency
dc.title Automated topic analysis for restricted scope health corpora: methodology and comparison with human performance
prism.startingpage 775
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
0077.pdf
Size:
389.7 KB
Format:
Adobe Portable Document Format
Description:
Collections