Defense Against Adversarial Attacks for Neural Representations of Text

Loading...
Thumbnail Image

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Interviewee

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Volume

Number/Issue

Starting Page

7592

Ending Page

Alternative Title

Abstract

In this paper, we focus on defending against adversarial attacks for privacy-preserving Natural Language Processing (NLP) under a model partitioning scenario, where the model splits into a local, on-device part and a remote, cloud-based part. Model partitioning improves the scalability and protects the privacy of inputs into the model. However, we argue that privacy protection breaks during inference with model partitioning. In this paper, an adversary eavesdrops on the hidden representations output from the local devices and tries to use the representations to obtain private information from the input text. We study two types of adversarial attacks, i.e., adversarial classification and adversarial generation. Based on these two attack models, we correspondingly propose two defenses: defending the adversarial classification (DAC) and defending the adversarial generation (DAG). Specifically, the DAC and DAG approaches are both bilevel optimization-based defense methods. Both methods optimally modify a subpopulation of the neural representations that are subject to maximally decreasing the adversary’s ability. The representations trained with this bilevel optimization protect sensitive information from the adversary attack while maintaining their utility for downstream tasks. Our experiments show that both DAC and DAG approaches improve the performance of the main text classifier and achieve even higher privacy of neural representations compared with the current state-of-the-art methods.

Description

Citation

Extent

10 pages

Format

Geographic Location

Time Period

Related To

Proceedings of the 57th Hawaii International Conference on System Sciences

Related To (URI)

Table of Contents

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Rights Holder

Catalog Record

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.