TY - GEN
T1 - Self-attention generative adversarial networks
AU - Zhang, Han
AU - Goodfellow, Ian
AU - Metaxas, Dimitris
AU - Odena, Augustus
PY - 2019/1/1
Y1 - 2019/1/1
N2 - In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GAXs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other. Furthermore, recent work has shown that generator conditioning affects GAN performance. Leveraging this insight, we apply spectral normalization to the GAN generator and find that this improves training dynamics. The proposed SAGAN performs better than prior work1, boosting the best published Inception score from 36.8 to 52.52 and reducing Fréhet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset. Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape.
AB - In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GAXs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other. Furthermore, recent work has shown that generator conditioning affects GAN performance. Leveraging this insight, we apply spectral normalization to the GAN generator and find that this improves training dynamics. The proposed SAGAN performs better than prior work1, boosting the best published Inception score from 36.8 to 52.52 and reducing Fréhet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset. Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape.
UR - http://www.scopus.com/inward/record.url?scp=85078069179&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078069179&partnerID=8YFLogxK
M3 - Conference contribution
T3 - 36th International Conference on Machine Learning, ICML 2019
SP - 12744
EP - 12753
BT - 36th International Conference on Machine Learning, ICML 2019
PB - International Machine Learning Society (IMLS)
T2 - 36th International Conference on Machine Learning, ICML 2019
Y2 - 9 June 2019 through 15 June 2019
ER -