Abstract
Scaling deep neural network training to more processors and larger batch sizes is key to reducing end-to-end training time; yet, maintaining comparable convergence and hardware utilization at larger scales is challenging. Increases in training scales have enabled natural gradient optimization methods as a reasonable alternative to stochastic gradient descent and variants thereof. Kronecker-factored Approximate Curvature (K-FAC), a natural gradient method, preconditions gradients with an efficient approximation of the Fisher Information Matrix to improve per-iteration progress when optimizing an objective function. Here we propose a scalable K-FAC algorithm and investigate K-FAC's applicability in large-scale deep neural network training. Specifically, we explore layer-wise distribution strategies, inverse-free second-order gradient evaluation, and dynamic K-FAC update decoupling, with the goal of preserving convergence while minimizing training time. We evaluate the convergence and scaling properties of our K-FAC gradient preconditioner, for image classification, object detection, and language modeling applications. In all applications, our implementation converges to baseline performance targets in 9-25% less time than the standard first-order optimizers on GPU clusters across a variety of scales.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 3616-3627 |
| Number of pages | 12 |
| Journal | IEEE Transactions on Parallel and Distributed Systems |
| Volume | 33 |
| Issue number | 12 |
| DOIs | |
| State | Published - Dec 1 2022 |
| Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Signal Processing
- Hardware and Architecture
- Computational Theory and Mathematics
Keywords
- Optimization methods
- high-performance computing
- neural networks
- scalability
Fingerprint
Dive into the research topics of 'Deep Neural Network Training With Distributed K-FAC'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver