TY - GEN
T1 - Focusing on What Matters
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
AU - Zhang, Wenjin
AU - Li, Keyi
AU - Yang, Sen
AU - Yuan, Sifan
AU - Marsic, Ivan
AU - Sippel, Genevieve J.
AU - Kim, Mary S.
AU - Burd, Randall S.
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Trauma is a leading cause of mortality worldwide, with about 20% of these deaths being preventable. Most of these preventable deaths result from errors during the initial resuscitation of injured patients. Decision support has been evaluated as an approach to support teams during this phase to reduce errors. Existing systems require manual data entry and monitoring, which makes tasks challenging to accomplish in a time-critical setting. This paper identified the specific challenges of achieving effective decision support in trauma resuscitation based on computer vision techniques, including complex backgrounds, crowded scenes, fine-grained activities, and a scarcity of labeled data. To address the first three challenges, the proposed system involved an actor tracker that identifies individuals, allowing the system to focus on actor-specific features. Video Masked Autoencoder (Video-MAE) was used to overcome the issue of insufficient labeled data. This approach enables self-supervised learning using unlabeled video content, improving feature representation for medical activities. For more reliable performance, an ensemble fusion method was introduced. This technique combines predictions from consecutive video clips and different actors. Our method outperformed existing approaches in identifying fine-grained activities, providing a solution for activity recognition in trauma resuscitation and similar complex domains.
AB - Trauma is a leading cause of mortality worldwide, with about 20% of these deaths being preventable. Most of these preventable deaths result from errors during the initial resuscitation of injured patients. Decision support has been evaluated as an approach to support teams during this phase to reduce errors. Existing systems require manual data entry and monitoring, which makes tasks challenging to accomplish in a time-critical setting. This paper identified the specific challenges of achieving effective decision support in trauma resuscitation based on computer vision techniques, including complex backgrounds, crowded scenes, fine-grained activities, and a scarcity of labeled data. To address the first three challenges, the proposed system involved an actor tracker that identifies individuals, allowing the system to focus on actor-specific features. Video Masked Autoencoder (Video-MAE) was used to overcome the issue of insufficient labeled data. This approach enables self-supervised learning using unlabeled video content, improving feature representation for medical activities. For more reliable performance, an ensemble fusion method was introduced. This technique combines predictions from consecutive video clips and different actors. Our method outperformed existing approaches in identifying fine-grained activities, providing a solution for activity recognition in trauma resuscitation and similar complex domains.
KW - medical activity recognition
KW - self-supervised learning
KW - video understanding
UR - http://www.scopus.com/inward/record.url?scp=85206454183&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85206454183&partnerID=8YFLogxK
U2 - 10.1109/CVPRW63382.2024.00500
DO - 10.1109/CVPRW63382.2024.00500
M3 - Conference contribution
AN - SCOPUS:85206454183
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 4950
EP - 4958
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
PB - IEEE Computer Society
Y2 - 16 June 2024 through 22 June 2024
ER -