Learning to forecast and refine residual motion for image-to-video generation

Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris Metaxas

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Scopus citations

Abstract

We consider the problem of image-to-video translation, where an input image is translated into an output video containing motions of a single object. Recent methods for such problems typically train transformation networks to generate future frames conditioned on the structure sequence. Parallel work has shown that short high-quality motions can be generated by spatiotemporal generative networks that leverage temporal knowledge from the training data. We combine the benefits of both approaches and propose a two-stage generation framework where videos are generated from structures and then refined by temporal signals. To model motions more efficiently, we train networks to learn residual motion between the current and future frames, which avoids learning motion-irrelevant details. We conduct extensive experiments on two image-to-video translation tasks: facial expression retargeting and human pose forecasting. Superior results over the state-of-the-art methods on both tasks demonstrate the effectiveness of our approach.

Original languageEnglish (US)
Title of host publicationComputer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
EditorsYair Weiss, Vittorio Ferrari, Cristian Sminchisescu, Martial Hebert
PublisherSpringer Verlag
Pages403-419
Number of pages17
ISBN (Print)9783030012663
DOIs
StatePublished - 2018
Event15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany
Duration: Sep 8 2018Sep 14 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11219 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other15th European Conference on Computer Vision, ECCV 2018
Country/TerritoryGermany
CityMunich
Period9/8/189/14/18

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Keywords

  • Motion forecasting
  • Residual learning
  • Video generation

Fingerprint

Dive into the research topics of 'Learning to forecast and refine residual motion for image-to-video generation'. Together they form a unique fingerprint.

Cite this