CHS: MEDIUM: COLLABORATIVE RESEARCH: SCALABLE INTEGRATION OF DATA-DRIVEN AND MODEL-BASED METHODS FOR LARGE VOCABULARY SIGN RECOGNITION AND SEARCH

Project Details

Description

It is surprisingly difficult to look up an unfamiliar sign in American Sign Language (ASL). Most ASL dictionaries list signs in alphabetical order based on approximate English translations, so a user who does not understand a sign or know its English translation would not know how to find it. ASL lacks a written form or intuitive 'alphabetical sorting' based on such a writing system. Although some dictionaries make available alternative ways to search for a sign, based on explicit specification of various properties, a user must often still look through hundreds of pictures of signs to find a match to the unfamiliar sign (if it is present at all in that dictionary). This research will create a framework that will enable the development of a user-friendly, video-based sign-lookup interface, for use with online ASL video dictionaries and resources, and for facilitation of ASL annotation. Input will consist of either a webcam recording of a sign by the user, or user identification of the start and end frames of a sign from a digital video. To test the efficacy of the new tools in real-world applications, the team will partner with the leading producer of pedagogical materials for ASL instruction in high schools and colleges, which is developing the first multimedia ASL dictionary with video-based ASL definitions for signs. The lookup interface will be used experimentally to search the ASL dictionary in ASL classes at Boston University and RIT. Project outcomes will revolutionize how deaf children, students learning ASL, or families with deaf children search ASL dictionaries. They will accelerate research on ASL linguistics and technology, by increasing efficiency, accuracy, and consistency of annotations of ASL videos through video-based sign lookup. And they will lay the groundwork for future technologies to benefit deaf users, such as search by video example through ASL video collections, or ASL-to-English translation, for which sign-recognition is a precursor. The new linguistically annotated video data and software tools will be shared publicly, for use by others in linguistic and computer science research, as well as in education. Sign recognition from video is still an open and difficult problem because of the nonlinearities involved in recognizing 3D structures from 2D video, and the complex linguistic organization of sign languages. The linguistic parameters relevant to sign production and discrimination include hand configuration and orientation, location relative to the body or in signing space, movement trajectory, and in some cases, facial expressions/head movements. An additional complication is that signs belonging to different classes have distinct internal structures, and are thus subject to different linguistic constraints and require distinct recognition strategies; yet prior research has generally failed to address these distinctions. The challenges are compounded by inter- and intra- signer variations, and, in continuous signing, by co-articulation effects (i.e., influence from adjacent signs) with respect to several of the above parameters. Purely data-driven approaches are ill-suited to sign recognition given the limited quantities of available, consistently annotated data and the complexity of the linguistic structures involved, which are hard to infer. Prior research has, for this reason, generally focused on selected aspects of the problem, often restricting the work to a limited vocabulary, and therefore resulting in methods that are not scalable. More importantly, few if any methods involve 4D (spatio-temporal) modeling and attention to the linguistic properties of specific types of signs. A new approach to computer-based recognition of ASL from video is needed. In this research, the approach will be to build a new hybrid, scalable, computational framework for sign identification from a large vocabulary, which has never before been achieved. This research will strategically combine state-of-the-art computer vision, machine-learning methods, and linguistic modeling. It will leverage the team's existing publicly shared ASL corpora and Sign Bank - linguistically annotated and categorized video recordings produced by native signers - which will be augmented to meet the requirements of this project.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date8/1/187/31/21

Funding

  • National Science Foundation (National Science Foundation (NSF))

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.