The manifestation of language in space poses special challenges for computer-based recognition. Prior approaches to sign recognition have not leveraged knowledge of linguistic structures and constraints, in part because of limitations in the computational models employed. In addition, they have focused on the recognition of limited classes of signs. No system exists that can recognize signs of all morphophonological types or that can even discriminate among these in continuous signing. Through integration of several computational approaches, informed by knowledge of linguistic properties of manual signs, and supported by a large existing linguistically annotated corpus, the team will develop a robust, comprehensive framework for sign recognition from video streams of natural, continuous signing. Fundamental differences in the linguistic structure of signs, distinguishing signed languages in 4D, with spatio-temporal dependencies and multiple production channels from spoken languages, are critical to computer-based recognition. This is because finger-spelled items, lexical signs, and classifier constructions, e.g., require different recognition strategies. Linguistic properties will be leveraged here for (i) segmentation and categorization of significantly different types of signs, and then, although this subsequent enterprise will necessarily be limited in scope within the project period, (ii) recognition of the segmented sign sequences. Through the 3D hand pose estimation from a team-developed tracker, w significant tracking accuracy, robustness, and computational efficiency will be attained. This 3D information is expected to greatly improve the recognition results, as compared with recognition schemes using only 2D information. The 3D estimated information from the tracking will be used in the proposed hierarchical Conditional Random Field (CRF) based recognition, to allow for tracking and recognition of signs that are distinct in their linguistic composition. Since other signed languages also rely on a very similar sign typology, this technology will be readily extensible to computer-based recognition of other signed languages.This linguistically-based hierarchical framework for ASL sign recognition?based on techniques with direct applicability to other signed languages, as well?provides, for the first time, a way to model and analyze the discrete and continuous aspects of signing, also enabling appropriate recognition strategies to be applied to signs with linguistically different composition. This approach will also allow the future integration of the discrete and continuous aspects of facial gestures with manual signing, to further improve computer-based modeling and analysis of ASL. The lack of such a framework has held back sign language recognition and generation. Advances in this area will, in turn, have far-ranging benefits for Universal Access and improved communication with the Deaf. Further applications of this technology include automated recognition and analysis by computer of non-verbal communication in general, security applications, human-computer interfaces, and virtual and augmented reality. In fact, these techniques have potential utility for any human-centered applications with continuous and discrete aspects. The proposed approach will offer ways to address similar problems in other domains characterized by multidimensional and complex spatio-temporal data that require the incorporation of domain knowledge. The products of this research, including software, videos, and annotations, will be made publicly available for use in research and education.
|Effective start/end date||9/1/10 → 8/31/13|
- National Science Foundation (National Science Foundation (NSF))