TY - GEN

T1 - Compressed counting

AU - Li, Ping

PY - 2009

Y1 - 2009

N2 - We propose Compressed Counting (CC) for approximating the ath frequency moments (0 < α ≤ 2) of data streams under a relaxed strict-Turnstile model, using maximally-skewed stable random projections. Estimators based on the geometric mean and the harmonic mean are developed. When α = 1, a simple counter suffices for counting the first moment (i.e., sum). The geometric mean estimator of CC has asymptotic variance ∝ Δ = |α - 1|, capturing the intuition that the complexity should decrease as Δ = |α1| → 0. However, the previous classical algorithms based on symmetric stable random projections[12, 15] required O (1/∈2) space, in order to approximate the ath moments within a 1 + ∈ factor, for any 0 < α ≤ 2 including α = 1. We show that using the geometric mean estimator, CC requires O (1/log(1+∈) + 2√Δ/ log3/2(1+e) + o (√Δ)) space, as Δ → 0. Therefore, in the neighborhood of α = 1, the complexity of CC is essentially O (1/∈) instead of O (1/∈2). CC may be useful for estimating Shannon entropy, which can be approximated by certain functions of the ath moments with a α → 1. [10, 9] suggested using α = 1 + Δ with (e.g.,) Δ < 0.0001 and ∈ < 10-7, to rigorously ensure reasonable approximations. Thus, unfortunately, CC is "theoretically impractical" for estimating Shannon entropy, despite its empirical success reported in [16].

AB - We propose Compressed Counting (CC) for approximating the ath frequency moments (0 < α ≤ 2) of data streams under a relaxed strict-Turnstile model, using maximally-skewed stable random projections. Estimators based on the geometric mean and the harmonic mean are developed. When α = 1, a simple counter suffices for counting the first moment (i.e., sum). The geometric mean estimator of CC has asymptotic variance ∝ Δ = |α - 1|, capturing the intuition that the complexity should decrease as Δ = |α1| → 0. However, the previous classical algorithms based on symmetric stable random projections[12, 15] required O (1/∈2) space, in order to approximate the ath moments within a 1 + ∈ factor, for any 0 < α ≤ 2 including α = 1. We show that using the geometric mean estimator, CC requires O (1/log(1+∈) + 2√Δ/ log3/2(1+e) + o (√Δ)) space, as Δ → 0. Therefore, in the neighborhood of α = 1, the complexity of CC is essentially O (1/∈) instead of O (1/∈2). CC may be useful for estimating Shannon entropy, which can be approximated by certain functions of the ath moments with a α → 1. [10, 9] suggested using α = 1 + Δ with (e.g.,) Δ < 0.0001 and ∈ < 10-7, to rigorously ensure reasonable approximations. Thus, unfortunately, CC is "theoretically impractical" for estimating Shannon entropy, despite its empirical success reported in [16].

UR - http://www.scopus.com/inward/record.url?scp=70349114207&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70349114207&partnerID=8YFLogxK

U2 - 10.1137/1.9781611973068.46

DO - 10.1137/1.9781611973068.46

M3 - Conference contribution

AN - SCOPUS:70349114207

SN - 9780898716801

T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms

SP - 412

EP - 421

BT - Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms

PB - Association for Computing Machinery (ACM)

T2 - 20th Annual ACM-SIAM Symposium on Discrete Algorithms

Y2 - 4 January 2009 through 6 January 2009

ER -