Sampling algorithms in a stream operator

Theodore Johnson, S. Muthukrishnan, Irina Rozenbaum

Research output: Contribution to journalConference articlepeer-review

53 Scopus citations

Abstract

Complex queries over high speed data streams often need to rely on approximations to keep up with their input. The research community has developed a rich literature on approximate streaming algorithms for this application. Many of these algorithms produce samples of the input stream, providing better properties than conventional random sampling. In this paper, we abstract the stream sampling process and design a new stream sample operator. We show how it can be used to implement a wide variety of algorithms that perform sampling and sampling-based aggregations. Also, we show how to implement the operator in Gigascope - a high speed stream database specialized for IP network monitoring applications. As an example study, we apply the operator within such an enhanced Gigascope to perform subset-sum sampling which is of great interest for IP network management. We evaluate this implemention on a live, high speed internet traffic data stream and find that (a) the operator is a flexible, versatile addition to Gigascope suitable for tuning and algorithm engineering, and (b) the operator imposes only a small evaluation overhead. This is the first operational implementation we know of, for a wide variety of stream sampling algorithms at line speed within a data stream management system.

Original languageEnglish (US)
Pages (from-to)1-12
Number of pages12
JournalProceedings of the ACM SIGMOD International Conference on Management of Data
DOIs
StatePublished - 2005
EventSIGMOD 2005: ACM SIGMOD International Conference on Management of Data - Baltimore, MD, United States
Duration: Jun 14 2005Jun 16 2005

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'Sampling algorithms in a stream operator'. Together they form a unique fingerprint.

Cite this