Data-dependent bounds on network gradient descent

Avleen Bijral, Anand D. Sarwate, Nathan Srebro

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

We study a consensus-based distributed stochastic gradient method for distributed optimization in a setting common for machine learning applications. Nodes in the network hold disjoint data and seek to optimize a common objective which decomposes into a sum of convex functions of individual data points. We show that the rate of convergence for this method involves the spectral properties of two matrices: the standard spectral gap of a weight matrix from the network topology and a new term depending on the spectral norm of the sample covariance matrix of the data. This result shows the benefit of datasets with small spectral norm. Extensions of the method can identify the impact of limited communication, increasing the number of nodes, and scaling with data set size.

Original languageEnglish (US)
Title of host publication54th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages869-874
Number of pages6
ISBN (Electronic)9781509045495
DOIs
StatePublished - Feb 10 2017
Event54th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2016 - Monticello, United States
Duration: Sep 27 2016Sep 30 2016

Publication series

Name54th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2016

Other

Other54th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2016
Country/TerritoryUnited States
CityMonticello
Period9/27/169/30/16

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Hardware and Architecture
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Data-dependent bounds on network gradient descent'. Together they form a unique fingerprint.

Cite this