Award Date


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Computer Science

First Committee Member

Kazem Taghva

Second Committee Member

Laxmi Gewali

Third Committee Member

Wolfgang Bein

Fourth Committee Member

Ashok Singh

Number of Pages



Optical Character Recognition (OCR) technology transforms textual visuals into an electronically readable, non-graphical format of the text. This allows the editing and other text manipulation of the content by language technology software such as machine translation, text comprehension, query-answering systems, and search engines. While Optical Character Recognition (OCR) systems continually progress towards greater precision, several complications persist when dealing with low-resolution source images or those with multicolored backgrounds. Consequently, the text derived from OCR necessitates additional refinement to optimize accuracy, beneficial for various subsequent applications. It is recognized that the character accuracy of OCR-generated text may influence certain natural language processing tasks, including Information Retrieval, Named-Entity Recognition, and Sentiment Analysis.

Post-processing techniques for Optical Character Recognition (OCR) consist of three fundamental stages of identifying incorrect words, producing a list of potential corrections, and selecting the accurate word from the list to replace the erroneous word. In this work, we are using large language models and word embeddings to detect recognition errors caused by the OCR software. In addition, we use the generative capabilities of these language models to suggest correction candidates to possibly fix the errors. Our work also includes the development of tools that can be used to further improve the OCR post-processing technologies.


Large language models; Natural language processing; Post-processing technologies; Optical Character Recognition; Machine translation


Artificial Intelligence and Robotics | Computer Sciences

File Format


File Size

3200 KB

Degree Grantor

University of Nevada, Las Vegas




IN COPYRIGHT. For more information about this rights statement, please visit