Human conversation analysis using attentive multimodal networks with hierarchical encoder-decoder

Yue Gu, Shiyu Fu, Xinyu Li, Kangning Yang, Kaixiang Huang, Shuhong Chen, Moliang Zhou, Ivan Marsic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations

Abstract

Human conversation analysis is challenging because the meaning can be expressed through words, intonation, or even body language and facial expression. We introduce a hierarchical encoder-decoder structure with attention mechanism for conversation analysis. The hierarchical encoder learns word-level features from video, audio, and text data that are then formulated into conversation-level features. The corresponding hierarchical decoder is able to predict different attributes at given time instances. To integrate multiple sensory inputs, we introduce a novel fusion strategy with modality attention. We evaluated our system on published emotion recognition, sentiment analysis, and speaker trait analysis datasets. Our system outperformed previous state-of-the-art approaches in both classification and regressions tasks on three datasets. We also outperformed previous approaches in generalization tests on two commonly used datasets. We achieved comparable performance in predicting co-existing labels using the proposed model instead of multiple individual models. In addition, the easily-visualized modality and temporal attention demonstrated that the proposed attention mechanism helps feature selection and improves model interpretability.

Original languageEnglish (US)
Title of host publicationMM 2018 - Proceedings of the 2018 ACM Multimedia Conference
PublisherAssociation for Computing Machinery, Inc
Pages537-545
Number of pages9
ISBN (Electronic)9781450356657
DOIs
StatePublished - Oct 15 2018
Event26th ACM Multimedia conference, MM 2018 - Seoul, Korea, Republic of
Duration: Oct 22 2018Oct 26 2018

Publication series

NameMM 2018 - Proceedings of the 2018 ACM Multimedia Conference

Other

Other26th ACM Multimedia conference, MM 2018
Country/TerritoryKorea, Republic of
CitySeoul
Period10/22/1810/26/18

All Science Journal Classification (ASJC) codes

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction

Keywords

  • Attention Mechanism
  • Hierarchical Encoder-Decoder Structure
  • Human Conversation Analysis
  • Sensor Fusion

Fingerprint

Dive into the research topics of 'Human conversation analysis using attentive multimodal networks with hierarchical encoder-decoder'. Together they form a unique fingerprint.

Cite this