TY - GEN
T1 - Video based activity recognition in trauma resuscitation
AU - Chakraborty, Ishani
AU - Elgammal, Ahmed
AU - Burd, Randall S.
PY - 2013
Y1 - 2013
N2 - We present a system for automated transcription of trauma resuscitation in the emergency department (ED). Using a ceiling-mounted single camera video recording, our goal is to track and transcribe the medical procedures performed during resuscitation of a patient, the time instances of their initiation and their temporal durations. In this multi-agent, multi-task setting, we represent procedures as high-level concepts composed of low-level features based on the patient's pose, scene dynamics, clinician motions and device locations. In particular, the low-level features are transformed into intermediate action attributes (e.g., 'hand grasping of an object of interest') and are used as building blocks to describe procedures. Procedures are expressed as first-order logic statements that capture spatio-temporal attribute interactions compactly in an activity grammar. The probabilities from feature observations and the logical semantics are combined probabilistically in a Markov Logic Network (MLN). At runtime, a Markov Network is dynamically constructed representing hypothesized procedures, spatio-temporal relationships and attribute probabilities. Inference on this network determines the most consistent sequence of procedures over time. Our activity model is modular and extendible to a multitude of sensor inputs and detection methods. The method is thus adaptable to many activity recognition problems. In this paper, we show our approach using videos of simulated trauma simulations. The accuracy of the results confirms the suitability of our framework.
AB - We present a system for automated transcription of trauma resuscitation in the emergency department (ED). Using a ceiling-mounted single camera video recording, our goal is to track and transcribe the medical procedures performed during resuscitation of a patient, the time instances of their initiation and their temporal durations. In this multi-agent, multi-task setting, we represent procedures as high-level concepts composed of low-level features based on the patient's pose, scene dynamics, clinician motions and device locations. In particular, the low-level features are transformed into intermediate action attributes (e.g., 'hand grasping of an object of interest') and are used as building blocks to describe procedures. Procedures are expressed as first-order logic statements that capture spatio-temporal attribute interactions compactly in an activity grammar. The probabilities from feature observations and the logical semantics are combined probabilistically in a Markov Logic Network (MLN). At runtime, a Markov Network is dynamically constructed representing hypothesized procedures, spatio-temporal relationships and attribute probabilities. Inference on this network determines the most consistent sequence of procedures over time. Our activity model is modular and extendible to a multitude of sensor inputs and detection methods. The method is thus adaptable to many activity recognition problems. In this paper, we show our approach using videos of simulated trauma simulations. The accuracy of the results confirms the suitability of our framework.
UR - https://www.scopus.com/pages/publications/84881499692
UR - https://www.scopus.com/pages/publications/84881499692#tab=citedBy
U2 - 10.1109/FG.2013.6553758
DO - 10.1109/FG.2013.6553758
M3 - Conference contribution
AN - SCOPUS:84881499692
SN - 9781467355452
T3 - 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013
BT - 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013
PB - IEEE Computer Society
T2 - 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013
Y2 - 22 April 2013 through 26 April 2013
ER -