Space lower bounds for distance approximation in the data stream model

Michael Saks, Xiaodong Sun

Research output: Contribution to journalConference articlepeer-review

75 Scopus citations

Abstract

We consider the problem of approximating the distance of two d-dimensional vectors x and y in the data stream model. In this model, the 2d coordinates are presented as a "stream" of data in some arbitrary order, where each data item includes the index and value of some coordinate and a bit that identifies the vector (x or y) to which it belongs. The goal is to minimize the amount of memory needed to approximate the distance. For the case of Lp-distance with p ε [1, 2], there are good approximation algorithms that run in polylogarithmic space and d (here we assume that each coordinate is an integer with O(log d) bits). Here we prove that they do not exist for p > 2. In particular, we prove an optimal approximation-space tradeoff of approximating L distance of two vectors. We show that any randomized algorithm that approximates L distance of two length d vectors within factor of dδ requires Ω(d1-4δ) space. As a consequence we show that for p > 2/(1 - 4δ), any randomized algorithm that approximate Lp distance of two length d vectors within a factor dδ requires Ω(d1-2/p-4δ) space. The lower bound follows from a lower bound on the two-party one-round communication complexity of this problem. This lower bound is proved using a combination of information theory and Fourier analysis.

Original languageEnglish (US)
Pages (from-to)360-369
Number of pages10
JournalConference Proceedings of the Annual ACM Symposium on Theory of Computing
DOIs
StatePublished - 2002
EventProceedings of the 34th Annual ACM Symposium on Theory of Computing - Montreal, Que., Canada
Duration: May 19 2002May 21 2002

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint

Dive into the research topics of 'Space lower bounds for distance approximation in the data stream model'. Together they form a unique fingerprint.

Cite this