Surpassing gradient descent provably: A cyclic incremental method with linear convergence rate

Aryan Mokhtari, Mert Gürbüzbalaban, Alejandro Ribeiro

Research output: Contribution to journalArticle

2 Scopus citations

Abstract

Recently, there has been growing interest in developing optimization methods for solving large-scale machine learning problems. Most of these boil down to the problem of minimizing an average of a finite set of smooth and strongly convex functions where the number of functions n is large. The gradient descent (GD) method is successful in minimizing convex problems at a fast linear rate; however, it is not applicable to the considered large-scale optimization setting because of the high computational complexity. Incremental methods resolve this drawback of gradient methods by replacing the required gradient for the descent direction with an incremental gradient approximation. They operate by evaluating one gradient per iteration and executing the average of the n available gradients as an approximate gradient. Although incremental methods reduce the computational cost of GD, their convergence rates do not justify their advantage relative to GD in terms of the total number of gradient evaluations until convergence. In this paper, we introduce a double incremental aggregated gradient method (DIAG) that computes the gradient of only one function at each iteration, which is chosen based on a cyclic scheme, and uses the aggregated average gradient of all the functions to approximate the full gradient. The iterates of the proposed DIAG method uses averages of both iterates and gradients in contrast to classic incremental methods that utilize gradient averages but do not utilize iterate averages. We prove that not only does the proposed DIAG method converge linearly to the optimal solution, but also its linear convergence factor justifies the advantage of incremental methods over GD. In particular, we prove that the worst-case performance of DIAG is better than the worst-case performance of GD. Numerical experiments on quadratic programming and logistic regression problems showcase the advantage of DIAG relative to GD and other incremental methods.

Original languageEnglish (US)
Pages (from-to)1420-1447
Number of pages28
JournalSIAM Journal on Optimization
Volume28
Issue number2
DOIs
StatePublished - Jan 1 2018

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science

Keywords

  • Finite sum minimization
  • Incremental methods
  • Large-scale optimization
  • Linear convergence rate
  • Worst-case analysis

Cite this