TY - GEN
T1 - A Comprehensively Tight Analysis of Gradient Descent for PCA
AU - Xu, Zhiqiang
AU - Li, Ping
N1 - Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - We study the Riemannian gradient method for PCA on which a crucial fact is that despite the simplicity of the considered setting, i.e., deterministic version of Krasulina’s method, the convergence rate has not been well-understood yet. In this work, we provide a general tight analysis for the gap-dependent rate at O(1/∆ log 1/ϵ) that holds for any real symmetric matrix. More importantly, when the gap ∆ is significantly smaller than the target accuracy ϵ on the objective sub-optimality of the final solution, the rate of this type is actually not tight any more, which calls for a worst-case rate. We further give the first worst-case analysis that achieves a rate of convergence at O(1/ϵ log 1/ϵ). The two analyses naturally roll out a comprehensively tight convergence rate at O(1/max{∆,ϵ}log 1/ϵ). Particularly, our gap-dependent analysis suggests a new promising learning rate for stochastic variance reduced PCA algorithms. Experiments are conducted to confirm our findings as well.
AB - We study the Riemannian gradient method for PCA on which a crucial fact is that despite the simplicity of the considered setting, i.e., deterministic version of Krasulina’s method, the convergence rate has not been well-understood yet. In this work, we provide a general tight analysis for the gap-dependent rate at O(1/∆ log 1/ϵ) that holds for any real symmetric matrix. More importantly, when the gap ∆ is significantly smaller than the target accuracy ϵ on the objective sub-optimality of the final solution, the rate of this type is actually not tight any more, which calls for a worst-case rate. We further give the first worst-case analysis that achieves a rate of convergence at O(1/ϵ log 1/ϵ). The two analyses naturally roll out a comprehensively tight convergence rate at O(1/max{∆,ϵ}log 1/ϵ). Particularly, our gap-dependent analysis suggests a new promising learning rate for stochastic variance reduced PCA algorithms. Experiments are conducted to confirm our findings as well.
UR - http://www.scopus.com/inward/record.url?scp=85121139912&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121139912&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85121139912
T3 - Advances in Neural Information Processing Systems
SP - 21935
EP - 21946
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
PB - Neural information processing systems foundation
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Y2 - 6 December 2021 through 14 December 2021
ER -