Human conversation analysis using attentive multimodal networks with hierarchical encoder-decoder

Yue Gu, Shiyu Fu, Xinyu Li, Kangning Yang, Kaixiang Huang, Shuhong Chen, Moliang Zhou, Ivan Marsic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations


Human conversation analysis is challenging because the meaning can be expressed through words, intonation, or even body language and facial expression. We introduce a hierarchical encoder-decoder structure with attention mechanism for conversation analysis. The hierarchical encoder learns word-level features from video, audio, and text data that are then formulated into conversation-level features. The corresponding hierarchical decoder is able to predict different attributes at given time instances. To integrate multiple sensory inputs, we introduce a novel fusion strategy with modality attention. We evaluated our system on published emotion recognition, sentiment analysis, and speaker trait analysis datasets. Our system outperformed previous state-of-the-art approaches in both classification and regressions tasks on three datasets. We also outperformed previous approaches in generalization tests on two commonly used datasets. We achieved comparable performance in predicting co-existing labels using the proposed model instead of multiple individual models. In addition, the easily-visualized modality and temporal attention demonstrated that the proposed attention mechanism helps feature selection and improves model interpretability.

Original languageEnglish (US)
Title of host publicationMM 2018 - Proceedings of the 2018 ACM Multimedia Conference
PublisherAssociation for Computing Machinery, Inc
Number of pages9
ISBN (Electronic)9781450356657
StatePublished - Oct 15 2018
Event26th ACM Multimedia conference, MM 2018 - Seoul, Korea, Republic of
Duration: Oct 22 2018Oct 26 2018

Publication series

NameMM 2018 - Proceedings of the 2018 ACM Multimedia Conference


Other26th ACM Multimedia conference, MM 2018
Country/TerritoryKorea, Republic of

All Science Journal Classification (ASJC) codes

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction


  • Attention Mechanism
  • Hierarchical Encoder-Decoder Structure
  • Human Conversation Analysis
  • Sensor Fusion


Dive into the research topics of 'Human conversation analysis using attentive multimodal networks with hierarchical encoder-decoder'. Together they form a unique fingerprint.

Cite this