TY - GEN
T1 - Quantized Densely Connected U-Nets for Efficient Landmark Localization
AU - Tang, Zhiqiang
AU - Peng, Xi
AU - Geng, Shijie
AU - Wu, Lingfei
AU - Zhang, Shaoting
AU - Metaxas, Dimitris
N1 - Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - In this paper, we propose quantized densely connected U-Nets for efficient visual landmark localization. The idea is that features of the same semantic meanings are globally reused across the stacked U-Nets. This dense connectivity largely improves the information flow, yielding improved localization accuracy. However, a vanilla dense design would suffer from critical efficiency issue in both training and testing. To solve this problem, we first propose order-K dense connectivity to trim off long-distance shortcuts; then, we use a memory-efficient implementation to significantly boost the training efficiency and investigate an iterative refinement that may slice the model size in half. Finally, to reduce the memory consumption and high precision operations both in training and testing, we further quantize weights, inputs, and gradients of our localization network to low bit-width numbers. We validate our approach in two tasks: human pose estimation and face alignment. The results show that our approach achieves state-of-the-art localization accuracy, but using $$\sim $$ 70% fewer parameters, $$\sim $$ 98% less model size and saving $$\sim $$ 32 $$\times $$ training memory compared with other benchmark localizers.
AB - In this paper, we propose quantized densely connected U-Nets for efficient visual landmark localization. The idea is that features of the same semantic meanings are globally reused across the stacked U-Nets. This dense connectivity largely improves the information flow, yielding improved localization accuracy. However, a vanilla dense design would suffer from critical efficiency issue in both training and testing. To solve this problem, we first propose order-K dense connectivity to trim off long-distance shortcuts; then, we use a memory-efficient implementation to significantly boost the training efficiency and investigate an iterative refinement that may slice the model size in half. Finally, to reduce the memory consumption and high precision operations both in training and testing, we further quantize weights, inputs, and gradients of our localization network to low bit-width numbers. We validate our approach in two tasks: human pose estimation and face alignment. The results show that our approach achieves state-of-the-art localization accuracy, but using $$\sim $$ 70% fewer parameters, $$\sim $$ 98% less model size and saving $$\sim $$ 32 $$\times $$ training memory compared with other benchmark localizers.
UR - http://www.scopus.com/inward/record.url?scp=85053852281&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85053852281&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-01219-9_21
DO - 10.1007/978-3-030-01219-9_21
M3 - Conference contribution
AN - SCOPUS:85053852281
SN - 9783030012182
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 348
EP - 364
BT - Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
A2 - Ferrari, Vittorio
A2 - Sminchisescu, Cristian
A2 - Hebert, Martial
A2 - Weiss, Yair
PB - Springer Verlag
T2 - 15th European Conference on Computer Vision, ECCV 2018
Y2 - 8 September 2018 through 14 September 2018
ER -