WearID: Low-Effort Wearable-Assisted Authentication of Voice Commands via Cross-Domain Comparison without Training

Cong Shi, Yan Wang, Yingying Chen, Nitesh Saxena, Chen Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Due to the open nature of voice input, voice assistant (VA) systems (e.g., Google Home and Amazon Alexa) are vulnerable to various security and privacy leakages (e.g., credit card numbers, passwords), especially when issuing critical user commands involving large purchases, critical calls, etc. Though the existing VA systems may employ voice features to identify users, they are still vulnerable to various acoustic-based attacks (e.g., impersonation, replay, and hidden command attacks). In this work, we propose a training-free voice authentication system, WearID, leveraging the cross-domain speech similarity between the audio domain and the vibration domain to provide enhanced security to the ever-growing deployment of VA systems. In particular, when a user gives a critical command, WearID exploits motion sensors on the user's wearable device to capture the aerial speech in the vibration domain and verify it with the speech captured in the audio domain via the VA device's microphone. Compared to existing approaches, our solution is low-effort and privacy-preserving, as it neither requires users' active inputs (e.g., replying messages/calls) nor to store users' privacy-sensitive voice samples for training. In addition, our solution exploits the distinct vibration sensing interface and its short sensing range to sound (e.g., 25cm) to verify voice commands. Examining the similarity of the two domains' data is not trivial. The huge sampling rate gap (e.g., 8000Hz vs. 200Hz) between the audio and vibration domains makes it hard to compare the two domains' data directly, and even tiny data noises could be magnified and cause authentication failures. To address the challenges, we investigate the complex relationship between the two sensing domains and develop a spectrogram-based algorithm to convert the microphone data into the lower-frequency "motion sensor data"to facilitate cross-domain comparisons. We further develop a user authentication scheme to verify that the received voice command originates from the legitimate user based on the cross-domain speech similarity of the received voice commands. We report on extensive experiments to evaluate the WearID under various audible and inaudible attacks. The results show WearID can verify voice commands with 99.8% accuracy in the normal situation and detect 97.2% fake voice commands from various attacks, including impersonation/replay attacks and hidden voice/ultrasound attacks.

Original languageEnglish (US)
Title of host publicationProceedings - 36th Annual Computer Security Applications Conference, ACSAC 2020
PublisherAssociation for Computing Machinery
Pages829-842
Number of pages14
ISBN (Electronic)9781450388580
DOIs
StatePublished - Dec 7 2020
Externally publishedYes
Event36th Annual Computer Security Applications Conference, ACSAC 2020 - Virtual, Online, United States
Duration: Dec 7 2020Dec 11 2020

Publication series

NameACM International Conference Proceeding Series

Conference

Conference36th Annual Computer Security Applications Conference, ACSAC 2020
Country/TerritoryUnited States
CityVirtual, Online
Period12/7/2012/11/20

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Keywords

  • Motion Sensor
  • User Authentication
  • Voice Assistant Systems

Fingerprint

Dive into the research topics of 'WearID: Low-Effort Wearable-Assisted Authentication of Voice Commands via Cross-Domain Comparison without Training'. Together they form a unique fingerprint.

Cite this