Award Date
8-15-2023
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computer Science
First Committee Member
Kazem Taghva
Second Committee Member
Laxmi Gewali
Third Committee Member
Wolfgang Bein
Fourth Committee Member
Ashok Singh
Number of Pages
79
Abstract
Optical Character Recognition (OCR) technology transforms textual visuals into an electronically readable, non-graphical format of the text. This allows the editing and other text manipulation of the content by language technology software such as machine translation, text comprehension, query-answering systems, and search engines. While Optical Character Recognition (OCR) systems continually progress towards greater precision, several complications persist when dealing with low-resolution source images or those with multicolored backgrounds. Consequently, the text derived from OCR necessitates additional refinement to optimize accuracy, beneficial for various subsequent applications. It is recognized that the character accuracy of OCR-generated text may influence certain natural language processing tasks, including Information Retrieval, Named-Entity Recognition, and Sentiment Analysis.
Post-processing techniques for Optical Character Recognition (OCR) consist of three fundamental stages of identifying incorrect words, producing a list of potential corrections, and selecting the accurate word from the list to replace the erroneous word. In this work, we are using large language models and word embeddings to detect recognition errors caused by the OCR software. In addition, we use the generative capabilities of these language models to suggest correction candidates to possibly fix the errors. Our work also includes the development of tools that can be used to further improve the OCR post-processing technologies.
Keywords
Large language models; Natural language processing; Post-processing technologies; Optical Character Recognition; Machine translation
Disciplines
Artificial Intelligence and Robotics | Computer Sciences
File Format
File Size
3200 KB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Repository Citation
Hajiali, Mahdi, "OCR Post-processing Using Large Language Models" (2023). UNLV Theses, Dissertations, Professional Papers, and Capstones. 4811.
http://dx.doi.org/10.34917/36910880
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/