TY - GEN
T1 - Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance
AU - Saks, Michael
AU - Seshadhri, C.
PY - 2013
Y1 - 2013
N2 - Approximating the length of the longest increasing sequence (LIS) of an array is a well-studied problem. We study this problem in the data stream model, where the algorithm is allowed to make a single left-to-right pass through the array and the key resource to be minimized is the amount of additional memory used. We present an algorithm which, for any δ > 0, given streaming access to an array of length n provides a (1 + δ)-multiplicative approximation to the distance to monotonicity (n minus the length of the LIS), and uses only O((log2 n)/δ) space. The previous best known approximation using polylogarithmic space was a multiplicative 2-factor. The improved approximation factor reflects a qualitative difference between our algorithm and previous algorithms: previous polylogarithmic space algorithms could not reliably detect increasing subsequences of length as large as n/2, while ours can detect increasing subsequences of length βn for any β > 0. More precisely, our algorithm can be used to estimate the length of the LIS to within an additive δn for any δ > 0 while previous algorithms could only achieve additive error n(1/2 - o(1)). Our algorithm is very simple, being just 3 lines of pseudocode, and has a small update time. It is essentially a polylogarithmic space approximate implementation of a classic dynamic program that computes the LIS. We also show how our technique can be applied to other problems solvable by dynamic programs. For example, we give a streaming algorithm for approximating LCS(x, y), the length of the longest common subsequence between strings x and y, each of length n. Our algorithm works in the asymmetric setting (inspired by [AKO10]), in which we have random access to y and streaming access to x, and runs in small space provided that no single symbol appears very often in y. More precisely, it gives an additive-δn approximation to LCS(x,y) (and hence also to E(x,y) = n - LCS(x,y), the edit distance between x and y when insertions and deletions, but not substitutions, are allowed), with space complexity O(k(log2 n)/δ), where k is the maximum number of times any one symbol appears in y. We also provide a deterministic 1-pass streaming algorithm that outputs a (1 + δ)-multiplicative approximation for E(x,y) (which is also an additive δn-approximation), in the asymmetric setting, and uses O(√n/δlog(n)) space. All these algorithms are obtained by carefully trading space and accuracy within a standard dynamic program.
AB - Approximating the length of the longest increasing sequence (LIS) of an array is a well-studied problem. We study this problem in the data stream model, where the algorithm is allowed to make a single left-to-right pass through the array and the key resource to be minimized is the amount of additional memory used. We present an algorithm which, for any δ > 0, given streaming access to an array of length n provides a (1 + δ)-multiplicative approximation to the distance to monotonicity (n minus the length of the LIS), and uses only O((log2 n)/δ) space. The previous best known approximation using polylogarithmic space was a multiplicative 2-factor. The improved approximation factor reflects a qualitative difference between our algorithm and previous algorithms: previous polylogarithmic space algorithms could not reliably detect increasing subsequences of length as large as n/2, while ours can detect increasing subsequences of length βn for any β > 0. More precisely, our algorithm can be used to estimate the length of the LIS to within an additive δn for any δ > 0 while previous algorithms could only achieve additive error n(1/2 - o(1)). Our algorithm is very simple, being just 3 lines of pseudocode, and has a small update time. It is essentially a polylogarithmic space approximate implementation of a classic dynamic program that computes the LIS. We also show how our technique can be applied to other problems solvable by dynamic programs. For example, we give a streaming algorithm for approximating LCS(x, y), the length of the longest common subsequence between strings x and y, each of length n. Our algorithm works in the asymmetric setting (inspired by [AKO10]), in which we have random access to y and streaming access to x, and runs in small space provided that no single symbol appears very often in y. More precisely, it gives an additive-δn approximation to LCS(x,y) (and hence also to E(x,y) = n - LCS(x,y), the edit distance between x and y when insertions and deletions, but not substitutions, are allowed), with space complexity O(k(log2 n)/δ), where k is the maximum number of times any one symbol appears in y. We also provide a deterministic 1-pass streaming algorithm that outputs a (1 + δ)-multiplicative approximation for E(x,y) (which is also an additive δn-approximation), in the asymmetric setting, and uses O(√n/δlog(n)) space. All these algorithms are obtained by carefully trading space and accuracy within a standard dynamic program.
UR - http://www.scopus.com/inward/record.url?scp=84876025531&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84876025531&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84876025531
SN - 9781611972511
T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
SP - 1698
EP - 1709
BT - Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013
T2 - 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013
Y2 - 6 January 2013 through 8 January 2013
ER -