## Abstract

We consider the problem of approximating the distance of two d-dimensional vectors x and y in the data stream model. In this model, the 2d coordinates are presented as a "stream" of data in some arbitrary order, where each data item includes the index and value of some coordinate and a bit that identifies the vector (x or y) to which it belongs. The goal is to minimize the amount of memory needed to approximate the distance. For the case of L^{p}-distance with p ε [1, 2], there are good approximation algorithms that run in polylogarithmic space and d (here we assume that each coordinate is an integer with O(log d) bits). Here we prove that they do not exist for p > 2. In particular, we prove an optimal approximation-space tradeoff of approximating L^{∞} distance of two vectors. We show that any randomized algorithm that approximates L^{∞} distance of two length d vectors within factor of d^{δ} requires Ω(d^{1-4δ}) space. As a consequence we show that for p > 2/(1 - 4δ), any randomized algorithm that approximate L^{p} distance of two length d vectors within a factor d^{δ} requires Ω(d^{1-2/p-4δ}) space. The lower bound follows from a lower bound on the two-party one-round communication complexity of this problem. This lower bound is proved using a combination of information theory and Fourier analysis.

Original language | English (US) |
---|---|

Pages (from-to) | 360-369 |

Number of pages | 10 |

Journal | Conference Proceedings of the Annual ACM Symposium on Theory of Computing |

DOIs | |

State | Published - 2002 |

Event | Proceedings of the 34th Annual ACM Symposium on Theory of Computing - Montreal, Que., Canada Duration: May 19 2002 → May 21 2002 |

## All Science Journal Classification (ASJC) codes

- Software