Identifying a critical threat to privacy through automatic image classification

David Lorenzi, Jaideep Vaidya

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Image classification, in general, is considered a hard problem, though it is necessary for many useful applications such as automatic target recognition. Indeed, no general methods exist that can work in varying scenarios and still achieve good performance across the board. In this paper, we actually identify a very interesting problem, where image classification is dangerously easy. We look at the problem of image classification, in the specific context of accurately classifying images containing highly sensitive data such as drivers licenses, credit cards and passports. Our key contribution is to build a Hierarchical Temporal Memory (HTM) network that is able to classify many sensitive images with over 90% accuracy, and use this to develop a system to automatically derive and transcribe sensitive information from image data. Our system classifies images into two groups - sensitive and non-sensitive. The group of sensitive images can then be further analyzed. This is a real world security issue that could easily lead to privacy problems such as identity theft, since scans of passports and drivers licenses are routinely emailed or kept in digital form, and many local documents are left unencrypted. Essentially, an attacker can use data mining and machine learning techniques very effectively to breach individual privacy. Thus, our main contribution is to demonstrate the efficacy of image classification for deriving sensitive information, which could also serve as a guide for other interesting applications such as document detection and analysis. Thus, it also serves as a warning against leaving data unencrypted and again proves that security through obscurity is simply not enough.

Original languageEnglish (US)
Title of host publicationCODASPY'11 - Proceedings of the 1st ACM Conference on Data and Application Security and Privacy
Pages157-167
Number of pages11
DOIs
StatePublished - 2011
Event1st ACM Conference on Data and Application Security and Privacy, CODASPY'11 - San Antonio, TX, United States
Duration: Feb 21 2011Feb 23 2011

Publication series

NameCODASPY'11 - Proceedings of the 1st ACM Conference on Data and Application Security and Privacy

Other

Other1st ACM Conference on Data and Application Security and Privacy, CODASPY'11
Country/TerritoryUnited States
CitySan Antonio, TX
Period2/21/112/23/11

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications

Keywords

  • Image classification
  • Neural networks
  • Privacy

Fingerprint

Dive into the research topics of 'Identifying a critical threat to privacy through automatic image classification'. Together they form a unique fingerprint.

Cite this