Piccolo: Exposing Complex Backdoors in NLP Transformer Models

Yingqi Liu, Guangyu Shen, Guanhong Tao, Shengwei An, Shiqing Ma, Xiangyu Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Backdoors can be injected to NLP models such that they misbehave when the trigger words or sentences appear in an input sample. Detecting such backdoors given only a subject model and a small number of benign samples is very challenging because of the unique nature of NLP applications, such as the discontinuity of pipeline and the large search space. Existing techniques work well for backdoors with simple triggers such as single character/word triggers but become less effective when triggers and models become complex (e.g., transformer models). We propose a new backdoor scanning technique. It transforms a subject model to an equivalent but differentiable form. It then uses optimization to invert a distribution of words denoting their likelihood in the trigger. It leverages a novel word discriminativity analysis to determine if the subject model is particularly discriminative for the presence of likely trigger words. Our evaluation on 3839 NLP models from the TrojAI competition and existing works with 7 state-of-art complex structures such as BERT and GPT, and 17 different attack types including two latest dynamic attacks, shows that our technique is highly effective, achieving over 0.9 detection accuracy in most scenarios and substantially outperforming two state-of-the-art scanners. Our submissions to TrojAI leaderboard achieve top performance in 2 out of the 3 rounds for NLP backdoor scanning.

Original languageEnglish (US)
Title of host publicationProceedings - 43rd IEEE Symposium on Security and Privacy, SP 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2025-2042
Number of pages18
ISBN (Electronic)9781665413169
DOIs
StatePublished - 2022
Event43rd IEEE Symposium on Security and Privacy, SP 2022 - San Francisco, United States
Duration: May 23 2022May 26 2022

Publication series

NameProceedings - IEEE Symposium on Security and Privacy
Volume2022-May
ISSN (Print)1081-6011

Conference

Conference43rd IEEE Symposium on Security and Privacy, SP 2022
Country/TerritoryUnited States
CitySan Francisco
Period5/23/225/26/22

All Science Journal Classification (ASJC) codes

  • Safety, Risk, Reliability and Quality
  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Piccolo: Exposing Complex Backdoors in NLP Transformer Models'. Together they form a unique fingerprint.

Cite this