Multimodal attention network for trauma activity recognition from spoken language and environmental sound

Yue Gu, Ruiyu Zhang, Xinwei Zhao, Shuhong Chen, Jalal Abdulbaqi, Ivan Marsic, Megan Cheng, Randall S. Burd

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Trauma activity recognition aims to detect, recognize, and predict the activities (or tasks) during trauma resuscitation. Previous work has mainly focused on using various sensor data including image, RFID, and vital signals to generate the trauma event log. However, spoken language and environmental sound, which contain rich communication and contextual information necessary for trauma team cooperation, are still largely ignored. In this paper, we propose a multimodal attention network (MAN) that uses both verbal transcripts and environmental audio stream as input; the model extracts textual and acoustic features using a multi-level multi-head attention module, and forms a final shared representation for trauma activity classification. We evaluated the proposed architecture on 75 actual trauma resuscitation cases collected from a hospital. We achieved 71.8% accuracy with 0.702 F1 score, demonstrating that our proposed architecture is useful and efficient. These results also show that using spoken language and environmental audio indeed helps identify hard-to-recognize activities, compared to previous approaches. We also provide a detailed analysis of the performance and generalization of the proposed multimodal attention network.

Original languageEnglish (US)
Title of host publication2019 IEEE International Conference on Healthcare Informatics, ICHI 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538691380
DOIs
StatePublished - Jun 2019
Event7th IEEE International Conference on Healthcare Informatics, ICHI 2019 - Xi'an, China
Duration: Jun 10 2019Jun 13 2019

Publication series

Name2019 IEEE International Conference on Healthcare Informatics, ICHI 2019

Conference

Conference7th IEEE International Conference on Healthcare Informatics, ICHI 2019
Country/TerritoryChina
CityXi'an
Period6/10/196/13/19

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Science Applications
  • Health Informatics
  • Biomedical Engineering

Keywords

  • Environmental sound
  • Multimodal attention network
  • Spoken language
  • Trauma activity recognition

Fingerprint

Dive into the research topics of 'Multimodal attention network for trauma activity recognition from spoken language and environmental sound'. Together they form a unique fingerprint.

Cite this