What Your Radiologist Might be Missing: Using Machine Learning to Identify Mislabeled Instances of X-ray Images

Date
2021-01-05
Authors
Rädsch, Tim
Eckhardt, Sven
Leiser, Florian
Pandl, Konstantin D.
Thiebes, Scott
Sunyaev, Ali
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
1294
Ending Page
Alternative Title
Abstract
Label quality is an important and common problem in contemporary supervised machine learning research. Mislabeled instances in a data set might not only impact the performance of machine learning models negatively but also make it more difficult to explain, and thus trust, the predictions of those models. While extant research has especially focused on the ex-ante improvement of label quality by proposing improvements to the labeling process, more recent research has started to investigate the use of machine learning-based approaches to identify mislabeled instances in training data sets automatically. In this study, we propose a two-staged pipeline for the automatic detection of potentially mislabeled instances in a large medical data set. Our results show that our pipeline successfully detects mislabeled instances, helping us to identify 7.4% of mislabeled instances of Cardiomegaly in the data set. With our research, we contribute to ongoing efforts regarding data quality in machine learning.
Description
Keywords
Explainable Artificial Intelligence (XAI)
Citation
Extent
10 pages
Format
Geographic Location
Time Period
Related To
Proceedings of the 54th Hawaii International Conference on System Sciences
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.