Outlier detection by example

Cui Zhu, Hiroyuki Kitagawa, Spiros Papadimitriou, Christos Faloutsos

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Outlier detection is a useful technique in such areas as fraud detection, financial analysis and health monitoring. Many recent approaches detect outliers according to reasonable, pre-defined concepts of an outlier (e.g., distance-based, density-based, etc.). However, the definition of an outlier differs between users or even datasets. This paper presents a solution to this problem by including input from the users. Our OBE (Outlier By Example) system is the first that allows users to provide examples of outliers in low-dimensional datasets. By incorporating a small number of such examples, OBE can successfully develop an algorithm by which to identify further outliers based on their outlierness. Several algorithmic challenges and engineering decisions must be addressed in building such a system. We describe the key design decisions and algorithms in this paper. In order to interact with users having different degrees of domain knowledge, we develop two detection schemes: OBE-Fraction and OBE-RF. Our experiments on both real and synthetic datasets demonstrate that OBE can discover values that a user would consider outliers.

Original languageEnglish (US)
Pages (from-to)217-247
Number of pages31
JournalJournal of Intelligent Information Systems
Volume36
Issue number2
DOIs
StatePublished - Apr 1 2011
Externally publishedYes

Fingerprint

Health
Monitoring
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Keywords

  • Data mining
  • Machine learning
  • Outlier detection
  • Outlier example

Cite this

Zhu, Cui ; Kitagawa, Hiroyuki ; Papadimitriou, Spiros ; Faloutsos, Christos. / Outlier detection by example. In: Journal of Intelligent Information Systems. 2011 ; Vol. 36, No. 2. pp. 217-247.
@article{cdb72b6820af476a8778963b74017cb8,
title = "Outlier detection by example",
abstract = "Outlier detection is a useful technique in such areas as fraud detection, financial analysis and health monitoring. Many recent approaches detect outliers according to reasonable, pre-defined concepts of an outlier (e.g., distance-based, density-based, etc.). However, the definition of an outlier differs between users or even datasets. This paper presents a solution to this problem by including input from the users. Our OBE (Outlier By Example) system is the first that allows users to provide examples of outliers in low-dimensional datasets. By incorporating a small number of such examples, OBE can successfully develop an algorithm by which to identify further outliers based on their outlierness. Several algorithmic challenges and engineering decisions must be addressed in building such a system. We describe the key design decisions and algorithms in this paper. In order to interact with users having different degrees of domain knowledge, we develop two detection schemes: OBE-Fraction and OBE-RF. Our experiments on both real and synthetic datasets demonstrate that OBE can discover values that a user would consider outliers.",
keywords = "Data mining, Machine learning, Outlier detection, Outlier example",
author = "Cui Zhu and Hiroyuki Kitagawa and Spiros Papadimitriou and Christos Faloutsos",
year = "2011",
month = "4",
day = "1",
doi = "10.1007/s10844-010-0128-1",
language = "English (US)",
volume = "36",
pages = "217--247",
journal = "Journal of Intelligent Information Systems",
issn = "0925-9902",
publisher = "Springer Netherlands",
number = "2",

}

Outlier detection by example. / Zhu, Cui; Kitagawa, Hiroyuki; Papadimitriou, Spiros; Faloutsos, Christos.

In: Journal of Intelligent Information Systems, Vol. 36, No. 2, 01.04.2011, p. 217-247.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Outlier detection by example

AU - Zhu, Cui

AU - Kitagawa, Hiroyuki

AU - Papadimitriou, Spiros

AU - Faloutsos, Christos

PY - 2011/4/1

Y1 - 2011/4/1

N2 - Outlier detection is a useful technique in such areas as fraud detection, financial analysis and health monitoring. Many recent approaches detect outliers according to reasonable, pre-defined concepts of an outlier (e.g., distance-based, density-based, etc.). However, the definition of an outlier differs between users or even datasets. This paper presents a solution to this problem by including input from the users. Our OBE (Outlier By Example) system is the first that allows users to provide examples of outliers in low-dimensional datasets. By incorporating a small number of such examples, OBE can successfully develop an algorithm by which to identify further outliers based on their outlierness. Several algorithmic challenges and engineering decisions must be addressed in building such a system. We describe the key design decisions and algorithms in this paper. In order to interact with users having different degrees of domain knowledge, we develop two detection schemes: OBE-Fraction and OBE-RF. Our experiments on both real and synthetic datasets demonstrate that OBE can discover values that a user would consider outliers.

AB - Outlier detection is a useful technique in such areas as fraud detection, financial analysis and health monitoring. Many recent approaches detect outliers according to reasonable, pre-defined concepts of an outlier (e.g., distance-based, density-based, etc.). However, the definition of an outlier differs between users or even datasets. This paper presents a solution to this problem by including input from the users. Our OBE (Outlier By Example) system is the first that allows users to provide examples of outliers in low-dimensional datasets. By incorporating a small number of such examples, OBE can successfully develop an algorithm by which to identify further outliers based on their outlierness. Several algorithmic challenges and engineering decisions must be addressed in building such a system. We describe the key design decisions and algorithms in this paper. In order to interact with users having different degrees of domain knowledge, we develop two detection schemes: OBE-Fraction and OBE-RF. Our experiments on both real and synthetic datasets demonstrate that OBE can discover values that a user would consider outliers.

KW - Data mining

KW - Machine learning

KW - Outlier detection

KW - Outlier example

UR - http://www.scopus.com/inward/record.url?scp=79952191748&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952191748&partnerID=8YFLogxK

U2 - 10.1007/s10844-010-0128-1

DO - 10.1007/s10844-010-0128-1

M3 - Article

AN - SCOPUS:79952191748

VL - 36

SP - 217

EP - 247

JO - Journal of Intelligent Information Systems

JF - Journal of Intelligent Information Systems

SN - 0925-9902

IS - 2

ER -