Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance

Michael Saks, C. Seshadhri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Scopus citations

Abstract

Approximating the length of the longest increasing sequence (LIS) of an array is a well-studied problem. We study this problem in the data stream model, where the algorithm is allowed to make a single left-to-right pass through the array and the key resource to be minimized is the amount of additional memory used. We present an algorithm which, for any δ > 0, given streaming access to an array of length n provides a (1 + δ)-multiplicative approximation to the distance to monotonicity (n minus the length of the LIS), and uses only O((log2 n)/δ) space. The previous best known approximation using polylogarithmic space was a multiplicative 2-factor. The improved approximation factor reflects a qualitative difference between our algorithm and previous algorithms: previous polylogarithmic space algorithms could not reliably detect increasing subsequences of length as large as n/2, while ours can detect increasing subsequences of length βn for any β > 0. More precisely, our algorithm can be used to estimate the length of the LIS to within an additive δn for any δ > 0 while previous algorithms could only achieve additive error n(1/2 - o(1)). Our algorithm is very simple, being just 3 lines of pseudocode, and has a small update time. It is essentially a polylogarithmic space approximate implementation of a classic dynamic program that computes the LIS. We also show how our technique can be applied to other problems solvable by dynamic programs. For example, we give a streaming algorithm for approximating LCS(x, y), the length of the longest common subsequence between strings x and y, each of length n. Our algorithm works in the asymmetric setting (inspired by [AKO10]), in which we have random access to y and streaming access to x, and runs in small space provided that no single symbol appears very often in y. More precisely, it gives an additive-δn approximation to LCS(x,y) (and hence also to E(x,y) = n - LCS(x,y), the edit distance between x and y when insertions and deletions, but not substitutions, are allowed), with space complexity O(k(log2 n)/δ), where k is the maximum number of times any one symbol appears in y. We also provide a deterministic 1-pass streaming algorithm that outputs a (1 + δ)-multiplicative approximation for E(x,y) (which is also an additive δn-approximation), in the asymmetric setting, and uses O(√n/δlog(n)) space. All these algorithms are obtained by carefully trading space and accuracy within a standard dynamic program.

Original languageEnglish (US)
Title of host publicationProceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013
Pages1698-1709
Number of pages12
StatePublished - 2013
Event24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013 - New Orleans, LA, United States
Duration: Jan 6 2013Jan 8 2013

Publication series

NameProceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms

Other

Other24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013
Country/TerritoryUnited States
CityNew Orleans, LA
Period1/6/131/8/13

All Science Journal Classification (ASJC) codes

  • Software
  • General Mathematics

Fingerprint

Dive into the research topics of 'Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance'. Together they form a unique fingerprint.

Cite this