Engineering the compression of massive tables: An experimental approach

Adam L. Buchsbaum, Donald F. Caldwell, Kenneth W. Church, Glenn S. Fowler, S. Muthukrishnan

Research output: Contribution to conferencePaperpeer-review

35 Scopus citations

Abstract

We study the problem of compressing massive tables. We devise a novel compression paradigm - training for lossless compression - which assumes that the data exhibit dependencies that can be learned by examining a small amount of training material. We develop an experimental methodology to test the approach. Our result is a system, pzip, which outperforms gzip by factors of two in compression size and both compression and uncompression time for various tabular data. Pzip is now in production use in an AT&T network traffic data warehouse.

Original languageEnglish (US)
Pages175-184
Number of pages10
StatePublished - 2000
Event11th Annual ACM-SIAM Symposium on Discrete Algorithms - San Francisco, CA, USA
Duration: Jan 9 2000Jan 11 2000

Other

Other11th Annual ACM-SIAM Symposium on Discrete Algorithms
CitySan Francisco, CA, USA
Period1/9/001/11/00

All Science Journal Classification (ASJC) codes

  • Software
  • Mathematics(all)

Fingerprint Dive into the research topics of 'Engineering the compression of massive tables: An experimental approach'. Together they form a unique fingerprint.

Cite this