WeSAL: Applying Active Supervision to Find High-quality Labels at Industrial Scale

Date
2020-01-07
Authors
Nashaat, Mona
Ghosh, Aindrila
Miller, James
Quader, Shaikh
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
Obtaining hand-labeled training data is one of the most tedious and expensive parts of the machine learning pipeline. Previous approaches, such as active learning aim at optimizing user engagement to acquire accurate labels. Other methods utilize weak supervision to generate low-quality labels at scale. In this paper, we propose a new hybrid method named WeSAL that incorporates Weak Supervision sources with Active Learning to keep humans in the loop. The method aims to generate large-scale training labels while enhancing its quality by involving domain experience. To evaluate WeSAL, we compare it with two-state-of-the-art labeling techniques, Active Learning and Data Programming. The experiments use five publicly available datasets and a real-world dataset of 1.5M records provided by our industrial partner, IBM. The results indicate that WeSAL can generate large-scale, high-quality labels while reducing the labeling cost by up to 68% compared to active learning.
Description
Keywords
Collaboration for Data Science, active learning, human-in-the-loop, machine learning, supervised learning, weak supervision
Citation
Extent
10 pages
Format
Geographic Location
Time Period
Related To
Proceedings of the 53rd Hawaii International Conference on System Sciences
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.