Volatile correlation computation: A checkpoint view

Wenjun Zhou, Hui Xiong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

Recent years have witnessed increased interest in computing strongly correlated pairs in very large databases. Most previous studies have been focused on static data sets. However, in real-world applications, input data are often dynamic and must continually be updated. With such large and growing data sets, new research efforts are expected to develop an incremental solution for correlation computing. Along this line, in this paper, we propose a CHECK-POINT algorithm that can efficiently incorporate new transactions for correlation computing as they become available. Specifically, we set a checkpoint to establish a computation buffer, which can help us determine an upper bound for the correlation. This checkpoint bound can be exploited to identify a list of candidate pairs, which will be maintained and computed for correlations as new transactions are added into the database. However, if the total number of new transactions is beyond the buffer size, a new upper bound is computed by the new checkpoint and a new list of candidate pairs is identified. Experimental results on real-world data sets show that CHECK-POINT can significantly reduce the correlation computing cost in dynamic data sets and has the advantage of compacting the use of memory space.

Original languageEnglish (US)
Title of host publicationKDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining
Pages848-856
Number of pages9
DOIs
StatePublished - 2008
Event14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008 - Las Vegas, NV, United States
Duration: Aug 24 2008Aug 27 2008

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008
Country/TerritoryUnited States
CityLas Vegas, NV
Period8/24/088/27/08

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Keywords

  • Checkpoint
  • Pearson's correlation coefficient
  • Volatile correlation computing
  • φ correlation coefficient

Fingerprint

Dive into the research topics of 'Volatile correlation computation: A checkpoint view'. Together they form a unique fingerprint.

Cite this