Learning Disentangled Factors from Paired Data in Cross-Modal Retrieval: An Implicit Identifiable VAE Approach

Minyoung Kim, Ricardo Guerrero, Vladimir Pavlovic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

We tackle the problem of learning the underlying disentangled latent factors that are shared between the paired bi-modal data in cross-modal retrieval. Typically the data in both modalities are complex, structured, and high dimensional (e.g., image and text), for which the conventional deep auto-encoding latent variable models such as the Variational Autoencoder (VAE) often suffer from difficulty of accurate decoder training or realistic synthesis. In this paper we propose a novel idea of the implicit decoder, which completely removes the ambient data decoding module from a latent variable model, via implicit encoder inversion that is achieved by Jacobian regularization of the low-dimensional embedding function. Motivated from the recent Identifiable-VAE (IVAE) model, we modify it to incorporate the query modality data as conditioning auxiliary input, which allows us to prove that the true parameters of the model can be identifiable under some regularity conditions. Tested on various datasets where the true factors are fully/partially available, our model is shown to identify the factors accurately, significantly outperforming conventional latent variable models.

Original languageEnglish (US)
Title of host publicationMM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages2862-2870
Number of pages9
ISBN (Electronic)9781450386517
DOIs
StatePublished - Oct 17 2021
Externally publishedYes
Event29th ACM International Conference on Multimedia, MM 2021 - Virtual, Online, China
Duration: Oct 20 2021Oct 24 2021

Publication series

NameMM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

Conference

Conference29th ACM International Conference on Multimedia, MM 2021
Country/TerritoryChina
CityVirtual, Online
Period10/20/2110/24/21

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Software
  • Computer Graphics and Computer-Aided Design

Keywords

  • cross-modal retrieval
  • factor analysis
  • latent variable model
  • multi-modal data analysis

Fingerprint

Dive into the research topics of 'Learning Disentangled Factors from Paired Data in Cross-Modal Retrieval: An Implicit Identifiable VAE Approach'. Together they form a unique fingerprint.

Cite this