DETECTING HIGHLIGHTED VIDEO CLIPS THROUGH EMOTION-ENHANCED AUDIO-VISUAL CUES

Linkang Hu, Weidong He, Le Zhang, Tong Xu, Hui Xiong, Enhong Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Recent years have witnessed the growing research interests in video highlight detection. Existing studies mainly focus on detecting highlights in user-generated videos with simple topics based on visual content. However, relying solely on visual features limits the ability of conventional methods to capture highlights for videos with more complicated semantics, like movies. Therefore, we propose to mine the emotional information in video sounds to enhance highlight detection. Specifically, we design a novel emotion-enhanced framework with multi-stage fusion to detect highlights for complex videos. Along this line, we first extract multi-grained features from the audio waves. Then, the tailored-designed intra-modal fusion is applied on audio features to obtain emotional representation. Furthermore, the cross-modal fusion is developed to generate comprehensive representation of clip by merging audio emotional representations and visual features. This representation can be leveraged for predicting highlight probability. Finally, extensive experiments on real-world datasets demonstrate the effectiveness of our method.

Original languageEnglish (US)
Title of host publication2021 IEEE International Conference on Multimedia and Expo, ICME 2021
PublisherIEEE Computer Society
ISBN (Electronic)9781665438643
DOIs
StatePublished - 2021
Externally publishedYes
Event2021 IEEE International Conference on Multimedia and Expo, ICME 2021 - Shenzhen, China
Duration: Jul 5 2021Jul 9 2021

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2021 IEEE International Conference on Multimedia and Expo, ICME 2021
Country/TerritoryChina
CityShenzhen
Period7/5/217/9/21

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications

Keywords

  • multimodal fusion
  • multimodal video analysis
  • video highlight detection

Fingerprint

Dive into the research topics of 'DETECTING HIGHLIGHTED VIDEO CLIPS THROUGH EMOTION-ENHANCED AUDIO-VISUAL CUES'. Together they form a unique fingerprint.

Cite this