TY - GEN
T1 - ImageNet training in minutes
AU - You, Yang
AU - Zhang, Zhao
AU - Hsieh, Cho Jui
AU - Demmel, James
AU - Keutzer, Kurt
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/8/13
Y1 - 2018/8/13
N2 - In this paper, we investigate large scale computers’ capability of speeding up deep neural networks (DNN) training. Our approach is to use large batch size, powered by the Layer-wise Adaptive Rate Scaling (LARS) algorithm, for efficient usage of massive computing resources. Our approach is generic, as we empirically evaluate the effectiveness on two neural networks: AlexNet and ResNet-50 trained with the ImageNet-1k dataset while preserving the state-of-the-art test accuracy. Compared to the baseline of a previous study from a group of researchers at Facebook, our approach shows higher test accuracy on batch sizes that are larger than 16K. Using 2,048 Intel Xeon Platinum 8160 processors, we reduce the 100-epoch AlexNet training time from hours to 11 minutes. With 2,048 Intel Xeon Phi 7250 Processors, we reduce the 90-epoch ResNet-50 training time from hours to 20 minutes. Our implementation is open source and has been released in the Intel distribution of Caffe v1.0.7.
AB - In this paper, we investigate large scale computers’ capability of speeding up deep neural networks (DNN) training. Our approach is to use large batch size, powered by the Layer-wise Adaptive Rate Scaling (LARS) algorithm, for efficient usage of massive computing resources. Our approach is generic, as we empirically evaluate the effectiveness on two neural networks: AlexNet and ResNet-50 trained with the ImageNet-1k dataset while preserving the state-of-the-art test accuracy. Compared to the baseline of a previous study from a group of researchers at Facebook, our approach shows higher test accuracy on batch sizes that are larger than 16K. Using 2,048 Intel Xeon Platinum 8160 processors, we reduce the 100-epoch AlexNet training time from hours to 11 minutes. With 2,048 Intel Xeon Phi 7250 Processors, we reduce the 90-epoch ResNet-50 training time from hours to 20 minutes. Our implementation is open source and has been released in the Intel distribution of Caffe v1.0.7.
KW - Distributed machine learning
KW - Fast deep neural networks training
UR - https://www.scopus.com/pages/publications/85054858257
UR - https://www.scopus.com/pages/publications/85054858257#tab=citedBy
U2 - 10.1145/3225058.3225069
DO - 10.1145/3225058.3225069
M3 - Conference contribution
AN - SCOPUS:85054858257
SN - 9781450365109
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 47th International Conference on Parallel Processing, ICPP 2018
PB - Association for Computing Machinery
T2 - 47th International Conference on Parallel Processing, ICPP 2018
Y2 - 14 August 2018 through 16 August 2018
ER -