ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning

Zhixiong Yang, Waheed U. Bajwa

Research output: Contribution to journalArticle

Abstract

Distributed machine learning algorithms enable learning of models from datasets that are distributed over a network without gathering the data at a centralized location. While efficient distributed algorithms have been developed under the assumption of faultless networks, failures that can render these algorithms nonfunctional occur frequently in the real world. This paper focuses on the problem of Byzantine failures, which are the hardest to safeguard against in distributed algorithms. While Byzantine fault tolerance has a rich history, existing work does not translate into efficient and practical algorithms for high-dimensional learning in fully distributed (also known as decentralized) settings. In this paper, an algorithm termed Byzantine-resilient distributed coordinate descent is developed and analyzed that enables distributed learning in the presence of Byzantine failures. Theoretical analysis (convex settings) and numerical experiments (convex and nonconvex settings) highlight its usefulness for high-dimensional distributed learning in the presence of Byzantine failures.

Original languageEnglish (US)
Article number8759887
Pages (from-to)611-627
Number of pages17
JournalIEEE Transactions on Signal and Information Processing over Networks
Volume5
Issue number4
DOIs
StatePublished - Dec 2019

Fingerprint

Parallel algorithms
Fault tolerance
Learning algorithms
Learning systems
Experiments

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Information Systems
  • Computer Networks and Communications

Keywords

  • Byzantine failure
  • consensus
  • coordinate descent
  • decentralized learning
  • distributed optimization
  • empirical risk minimization
  • machine learning

Cite this

@article{df18248fcb96468686f272c315136a41,
title = "ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning",
abstract = "Distributed machine learning algorithms enable learning of models from datasets that are distributed over a network without gathering the data at a centralized location. While efficient distributed algorithms have been developed under the assumption of faultless networks, failures that can render these algorithms nonfunctional occur frequently in the real world. This paper focuses on the problem of Byzantine failures, which are the hardest to safeguard against in distributed algorithms. While Byzantine fault tolerance has a rich history, existing work does not translate into efficient and practical algorithms for high-dimensional learning in fully distributed (also known as decentralized) settings. In this paper, an algorithm termed Byzantine-resilient distributed coordinate descent is developed and analyzed that enables distributed learning in the presence of Byzantine failures. Theoretical analysis (convex settings) and numerical experiments (convex and nonconvex settings) highlight its usefulness for high-dimensional distributed learning in the presence of Byzantine failures.",
keywords = "Byzantine failure, consensus, coordinate descent, decentralized learning, distributed optimization, empirical risk minimization, machine learning",
author = "Zhixiong Yang and Bajwa, {Waheed U.}",
year = "2019",
month = "12",
doi = "10.1109/TSIPN.2019.2928176",
language = "English (US)",
volume = "5",
pages = "611--627",
journal = "IEEE Transactions on Signal and Information Processing over Networks",
issn = "2373-776X",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "4",

}

ByRDiE : Byzantine-resilient distributed coordinate descent for decentralized learning. / Yang, Zhixiong; Bajwa, Waheed U.

In: IEEE Transactions on Signal and Information Processing over Networks, Vol. 5, No. 4, 8759887, 12.2019, p. 611-627.

Research output: Contribution to journalArticle

TY - JOUR

T1 - ByRDiE

T2 - Byzantine-resilient distributed coordinate descent for decentralized learning

AU - Yang, Zhixiong

AU - Bajwa, Waheed U.

PY - 2019/12

Y1 - 2019/12

N2 - Distributed machine learning algorithms enable learning of models from datasets that are distributed over a network without gathering the data at a centralized location. While efficient distributed algorithms have been developed under the assumption of faultless networks, failures that can render these algorithms nonfunctional occur frequently in the real world. This paper focuses on the problem of Byzantine failures, which are the hardest to safeguard against in distributed algorithms. While Byzantine fault tolerance has a rich history, existing work does not translate into efficient and practical algorithms for high-dimensional learning in fully distributed (also known as decentralized) settings. In this paper, an algorithm termed Byzantine-resilient distributed coordinate descent is developed and analyzed that enables distributed learning in the presence of Byzantine failures. Theoretical analysis (convex settings) and numerical experiments (convex and nonconvex settings) highlight its usefulness for high-dimensional distributed learning in the presence of Byzantine failures.

AB - Distributed machine learning algorithms enable learning of models from datasets that are distributed over a network without gathering the data at a centralized location. While efficient distributed algorithms have been developed under the assumption of faultless networks, failures that can render these algorithms nonfunctional occur frequently in the real world. This paper focuses on the problem of Byzantine failures, which are the hardest to safeguard against in distributed algorithms. While Byzantine fault tolerance has a rich history, existing work does not translate into efficient and practical algorithms for high-dimensional learning in fully distributed (also known as decentralized) settings. In this paper, an algorithm termed Byzantine-resilient distributed coordinate descent is developed and analyzed that enables distributed learning in the presence of Byzantine failures. Theoretical analysis (convex settings) and numerical experiments (convex and nonconvex settings) highlight its usefulness for high-dimensional distributed learning in the presence of Byzantine failures.

KW - Byzantine failure

KW - consensus

KW - coordinate descent

KW - decentralized learning

KW - distributed optimization

KW - empirical risk minimization

KW - machine learning

UR - http://www.scopus.com/inward/record.url?scp=85069944267&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069944267&partnerID=8YFLogxK

U2 - 10.1109/TSIPN.2019.2928176

DO - 10.1109/TSIPN.2019.2928176

M3 - Article

AN - SCOPUS:85069944267

VL - 5

SP - 611

EP - 627

JO - IEEE Transactions on Signal and Information Processing over Networks

JF - IEEE Transactions on Signal and Information Processing over Networks

SN - 2373-776X

IS - 4

M1 - 8759887

ER -