Audio-visual speaker detection using dynamic Bayesian networks

Ashutosh Garg, Vladimir Pavlović, James M. Rehg

Research output: Chapter in Book/Report/Conference proceedingConference contribution

23 Scopus citations

Abstract

The development of human-computer interfaces poses a challenging problem: actions and intentions of different users have to be inferred from sequences of noisy and ambiguous sensory data. Temporal fusion of multiple sensors can be efficiently formulated using dynamic Bayesian networks (DBN). The DBN framework allows the power of statistical inference and learning to be combined with contextual knowledge of the problem. We demonstrate the use of DBN in tackling the problem of audio/visual speaker detection. "Off-the-shelf" visual and audio sensors (face, skin, texture, mouth motion, and silence detectors) are optimally fused along with contextual information in a DBN architecture that infers instances when an individual is speaking. Results obtained in the setup of an actual human-machine interaction system (Genie Casino Kiosk) demonstrate superiority of our approach over that of static, context-free fusion architecture.

Original languageEnglish (US)
Title of host publicationProceedings - 4th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2000
PublisherIEEE Computer Society
Pages384-390
Number of pages7
ISBN (Print)0769505805, 9780769505800
DOIs
StatePublished - Jan 1 2000
Event4th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2000 - Grenoble, France
Duration: Mar 28 2000Mar 30 2000

Publication series

NameProceedings - 4th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2000

Other

Other4th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2000
CountryFrance
CityGrenoble
Period3/28/003/30/00

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition

Cite this

Garg, A., Pavlović, V., & Rehg, J. M. (2000). Audio-visual speaker detection using dynamic Bayesian networks. In Proceedings - 4th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2000 (pp. 384-390). [840663] (Proceedings - 4th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2000). IEEE Computer Society. https://doi.org/10.1109/AFGR.2000.840663