Techniques for language identification for hybrid Arabic-english document images

Ahmed M. Elgammal, Mohamed A. Ismail

Research output: Chapter in Book/Report/Conference proceedingConference contribution

42 Scopus citations

Abstract

Because of the different characteristics of Arabic language and Romance and Anglo Saxon languages, recognition of documents written in hybrid of these languages requires that the language of the text to be identified priori to the recognition phase. In this paper, three efficient techniques that can be used to discriminate between text written in Arabic script and text written in English script are presented and evaluated. These techniques addresses the language identification problem on the word level and on textline level. The characteristics of horizontal projection profiles as well as runlehgth histograms for text written in both languages are the basic features underlying these techniques. Solving this problem is very important in building bilingual document image analysis systems which are capable of processing documents containing hybrid Arabic/Romance and Anglo Saxon languages.

Original languageEnglish (US)
Title of host publicationProceedings - 6th International Conference on Document Analysis and Recognition, ICDAR 2001
PublisherIEEE Computer Society
Pages1100-1104
Number of pages5
ISBN (Electronic)0769512631, 0769512631, 0769512631
DOIs
StatePublished - 2001
Externally publishedYes
Event6th International Conference on Document Analysis and Recognition, ICDAR 2001 - Seattle, United States
Duration: Sep 10 2001Sep 13 2001

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume2001-January
ISSN (Print)1520-5363

Other

Other6th International Conference on Document Analysis and Recognition, ICDAR 2001
CountryUnited States
CitySeattle
Period9/10/019/13/01

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'Techniques for language identification for hybrid Arabic-english document images'. Together they form a unique fingerprint.

Cite this