StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks

Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas

Research output: Chapter in Book/Report/Conference proceedingConference contribution

460 Scopus citations

Abstract

Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications. Samples generated by existing textto- image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts. In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) to generate 256.256 photo-realistic images conditioned on text descriptions. We decompose the hard problem into more manageable sub-problems through a sketch-refinement process. The Stage-I GAN sketches the primitive shape and colors of the object based on the given text description, yielding Stage-I low-resolution images. The Stage-II GAN takes Stage-I results and text descriptions as inputs, and generates high-resolution images with photo-realistic details. It is able to rectify defects in Stage-I results and add compelling details with the refinement process. To improve the diversity of the synthesized images and stabilize the training of the conditional-GAN, we introduce a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold. Extensive experiments and comparisons with state-of-the-arts on benchmark datasets demonstrate that the proposed method achieves significant improvements on generating photo-realistic images conditioned on text descriptions.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5908-5916
Number of pages9
ISBN (Electronic)9781538610329
DOIs
StatePublished - Dec 22 2017
Event16th IEEE International Conference on Computer Vision, ICCV 2017 - Venice, Italy
Duration: Oct 22 2017Oct 29 2017

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
Volume2017-October
ISSN (Print)1550-5499

Other

Other16th IEEE International Conference on Computer Vision, ICCV 2017
CountryItaly
CityVenice
Period10/22/1710/29/17

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks'. Together they form a unique fingerprint.

  • Cite this

    Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas, D. (2017). StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017 (pp. 5908-5916). [8237891] (Proceedings of the IEEE International Conference on Computer Vision; Vol. 2017-October). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV.2017.629