De-Identification of Privacy Sensitive Information in Resumes with GPT-4: An Utility Analysis for Automated Job Role Classification
Files
Date
2025-01-07
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
884
Ending Page
Alternative Title
Abstract
As organizations face the challenge of managing large amounts of data, privacy concerns have become increasingly prevalent when sharing sensitive privacy information with machine learning experts. This paper addresses the fundamental issue of privacy-sensitive information de-identification by introducing in-prompt de-identification, an approach that exploits the capabilities of large language models. Existing de-identification techniques often struggle to ensure complete privacy, and methods with higher privacy often result in a loss of data utility. In contrast, in-prompt de-identification is capable of generating synthetic, human-readable data samples from given inputs and bridges the gap between privacy and utility. With this article, we contribute to the de-identification of real-world resume data using in-prompt de-identification based on OpenAI’s GPT-4. Notably, our classification model, trained on GPT-4 generated data, shows no significant loss in performance compared to our baseline model trained on the original data.
Description
Keywords
AI Safety, Cybersecurity, and Inclusion through Advanced Text Analytics, de-identification, fairness, generative ai, privacy sensitive information, resume
Citation
Extent
10
Format
Geographic Location
Time Period
Related To
Proceedings of the 58th Hawaii International Conference on System Sciences
Related To (URI)
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.