Project Details


This collaborative research project is being undertaken by Dr. Ahmed Elgammal, Rutgers University, and Dr. Sherif Abdou, Cairo University in Egypt, in order to create an automated system that will input pronunciations from language learners along with dynamic images of their faces as they speak. Using this information, the proposed system will assess whether the user has produced an accurate pronunciation of a particular word or phrase in the target language. The system will then offer specific feedback to the user regarding the quality of their pronunciation, with suggestions for improvement. Specifically, the research will entail analyzing pronunciation errors that occur for non-native speakers of English and Arabic.The researchers anticipate that this speech recognition system will be robust enough to withstand large mispronunciations that may occur when the user has a non-native tongue. Additionally, the system will be designed to assign a pass or fail score to the user?s utterance, detect where the error occurred, and classify the error in order to give adequate feedback on how the pronunciation should be altered. In order to create such a system, novel algorithms will be developed for combining visual cues with audio cues for the task of mispronunciation detection. This will be achieved by deploying lip tracking algorithms and studying the correlation between lip movement and speech through style-dependent models of individual users. The research plan will also result in collection of a large audio-visual database of the phonemes by native and non-native Arabic and English speakers. The significance of initially developing the mispronunciation detection for English and Arabic is that these two languages are both phonetically rich and different from one another, leading to an assumption that the mispronunciation detection has a wide working spectrum.The societal benefits of efficient speech training systems are significant. In addition to numerous benefits for language learning, the expected outcome of this collaborative research is a tool that would be one of the core components for any Computer Aided Language Learning (CALL) system. Such an outcome could provide a cost effective and personalized tool for hearing-impaired speakers where the acquisition of an acceptable pronunciation can be challenging.
Effective start/end date10/1/099/30/12


  • National Science Foundation (National Science Foundation (NSF))


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.