Om Script Recognition System for Camera-Based Multilingual Document Images
Text is one of the greatest inventions of humankind it has a vital role in human's lives starting from ancient times. The rich and meaningful information lies in the text, and it is the crucial element in the development of computer-vision applications. The text is a symbolic representation of any language known as script. It is necessary to identify the script from the document image to develop an Optical Character Recognition (OCR) system. For processing the document image, it is essential to locate the script in the document and identify the script to decide the appropriate OCR system to recognize. Once the computer successfully recognizes the underlying script of the document, then the task like editing, searching, indexing etc. can be performed on the document image.
Traditionally, the documents are scanned by flat-bed scanners. The trend has gradually moved from the scanner to digital cameras with high-resolution lenses used to capture the document. The camera has advantages over the traditional scanner in a lightweight, small in size, ability to capture the seen text, boards displayed on the roads, vehicle numbers etc. and document can be captured without touching the hard copy, and capturing too fragile documents. The camera captured documents have some challenging problems like uneven illumination, blur, shadow, perspective distortion, out of focus, and many more.
The script identification system is an important area in document image analysis. This has broadly categorized as printed or handwritten text from document images. Till date, many algorithms have been presented in the literature to perform this task for a specific language, and such OCRs will not work for a document containing more than one script. Hence script identification is crucial for the automatic processing of textual documents in multi-lingual information management. Identification of the script in document image is the primary steps towards processing the multi-lingual documents. It is necessary to identify the scripts of a multi-lingual document before submitting them to the appropriate OCR system.
Visa mer