We consider the problem of approximating the distance of two d-dimensional vectors x and y in the data stream model. In this model, the 2d coordinates are presented as a "stream" of data in some arbitrary order, where each data item includes the index and value of some coordinate and a bit that identifies the vector (x or y) to which it belongs. The goal is to minimize the amount of memory needed to approximate the distance. For the case of Lp-distance with p ε [1, 2], there are good approximation algorithms that run in polylogarithmic space and d (here we assume that each coordinate is an integer with O(log d) bits). Here we prove that they do not exist for p > 2. In particular, we prove an optimal approximation-space tradeoff of approximating L∞ distance of two vectors. We show that any randomized algorithm that approximates L∞ distance of two length d vectors within factor of dδ requires Ω(d1-4δ) space. As a consequence we show that for p > 2/(1 - 4δ), any randomized algorithm that approximate Lp distance of two length d vectors within a factor dδ requires Ω(d1-2/p-4δ) space. The lower bound follows from a lower bound on the two-party one-round communication complexity of this problem. This lower bound is proved using a combination of information theory and Fourier analysis.
|Original language||English (US)|
|Number of pages||10|
|Journal||Conference Proceedings of the Annual ACM Symposium on Theory of Computing|
|State||Published - 2002|
|Event||Proceedings of the 34th Annual ACM Symposium on Theory of Computing - Montreal, Que., Canada|
Duration: May 19 2002 → May 21 2002
All Science Journal Classification (ASJC) codes