TY - JOUR
T1 - Conditional models for contextual human motion recognition
AU - Sminchisescu, Cristian
AU - Kanaujia, Atul
AU - Metaxas, Dimitris
N1 - Funding Information:
The authors acknowledge the support of Zhiguo Li with experiments and preparing the database and thank the anonymous reviewers for valuable comments. Cristian Sminchisescu gives special thanks to Allan Jepson at the University of Toronto, for many insightful discussions and feedback on the topics presented in this paper. C.S. has been partly funded by NSF Grant IIS-0535140.
PY - 2006/11
Y1 - 2006/11
N2 - We describe algorithms for recognizing human motion in monocular video sequences, based on discriminative conditional random fields (CRFs) and maximum entropy Markov models (MEMMs). Existing approaches to this problem typically use generative structures like the hidden Markov model (HMM). Therefore, they have to make simplifying, often unrealistic assumptions on the conditional independence of observations given the motion class labels and cannot accommodate rich overlapping features of the observation or long-term contextual dependencies among observations at multiple timesteps. This makes them prone to myopic failures in recognizing many human motions, because even the transition between simple human activities naturally has temporal segments of ambiguity and overlap. The correct interpretation of these sequences requires more holistic, contextual decisions, where the estimate of an activity at a particular timestep could be constrained by longer windows of observations, prior and even posterior to that timestep. This would not be computationally feasible with a HMM which requires the enumeration of a number of observation sequences exponential in the size of the context window. In this work we follow a different philosophy: instead of restrictively modeling the complex image generation process - the observation, we work with models that can unrestrictedly take it as an input, hence condition on it. Conditional models like the proposed CRFs seamlessly represent contextual dependencies and have computationally attractive properties: they support efficient, exact recognition using dynamic programming, and their parameters can be learned using convex optimization. We introduce conditional graphical models as complementary tools for human motion recognition and present an extensive set of experiments that show not only how these can successfully classify diverse human activities like walking, jumping, running, picking or dancing, but also how they can discriminate among subtle motion styles like normal walks and wander walks.
AB - We describe algorithms for recognizing human motion in monocular video sequences, based on discriminative conditional random fields (CRFs) and maximum entropy Markov models (MEMMs). Existing approaches to this problem typically use generative structures like the hidden Markov model (HMM). Therefore, they have to make simplifying, often unrealistic assumptions on the conditional independence of observations given the motion class labels and cannot accommodate rich overlapping features of the observation or long-term contextual dependencies among observations at multiple timesteps. This makes them prone to myopic failures in recognizing many human motions, because even the transition between simple human activities naturally has temporal segments of ambiguity and overlap. The correct interpretation of these sequences requires more holistic, contextual decisions, where the estimate of an activity at a particular timestep could be constrained by longer windows of observations, prior and even posterior to that timestep. This would not be computationally feasible with a HMM which requires the enumeration of a number of observation sequences exponential in the size of the context window. In this work we follow a different philosophy: instead of restrictively modeling the complex image generation process - the observation, we work with models that can unrestrictedly take it as an input, hence condition on it. Conditional models like the proposed CRFs seamlessly represent contextual dependencies and have computationally attractive properties: they support efficient, exact recognition using dynamic programming, and their parameters can be learned using convex optimization. We introduce conditional graphical models as complementary tools for human motion recognition and present an extensive set of experiments that show not only how these can successfully classify diverse human activities like walking, jumping, running, picking or dancing, but also how they can discriminate among subtle motion styles like normal walks and wander walks.
KW - Conditional models
KW - Discriminative models
KW - Feature selection
KW - Hidden Markov models
KW - Human motion recognition
KW - Markov random fields
KW - Multiclass logistic regression
KW - Optimization
UR - http://www.scopus.com/inward/record.url?scp=33749993686&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33749993686&partnerID=8YFLogxK
U2 - 10.1016/j.cviu.2006.07.014
DO - 10.1016/j.cviu.2006.07.014
M3 - Article
AN - SCOPUS:33749993686
VL - 104
SP - 210
EP - 220
JO - Computer Vision and Image Understanding
JF - Computer Vision and Image Understanding
SN - 1077-3142
IS - 2-3 SPEC. ISS.
ER -