Award Date

8-15-2023

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Committee Member

Kazem Taghva

Second Committee Member

Laxmi Gewali

Third Committee Member

Wolfgang Bein

Fourth Committee Member

Ashok Singh

Number of Pages

Abstract

Optical Character Recognition (OCR) technology transforms textual visuals into an electronically readable, non-graphical format of the text. This allows the editing and other text manipulation of the content by language technology software such as machine translation, text comprehension, query-answering systems, and search engines. While Optical Character Recognition (OCR) systems continually progress towards greater precision, several complications persist when dealing with low-resolution source images or those with multicolored backgrounds. Consequently, the text derived from OCR necessitates additional refinement to optimize accuracy, beneficial for various subsequent applications. It is recognized that the character accuracy of OCR-generated text may influence certain natural language processing tasks, including Information Retrieval, Named-Entity Recognition, and Sentiment Analysis.

Post-processing techniques for Optical Character Recognition (OCR) consist of three fundamental stages of identifying incorrect words, producing a list of potential corrections, and selecting the accurate word from the list to replace the erroneous word. In this work, we are using large language models and word embeddings to detect recognition errors caused by the OCR software. In addition, we use the generative capabilities of these language models to suggest correction candidates to possibly fix the errors. Our work also includes the development of tools that can be used to further improve the OCR post-processing technologies.

Keywords

Large language models; Natural language processing; Post-processing technologies; Optical Character Recognition; Machine translation

Disciplines

Artificial Intelligence and Robotics | Computer Sciences

File Format

pdf

File Size

3200 KB

Degree Grantor

University of Nevada, Las Vegas

Language

English

Repository Citation

Hajiali, Mahdi, "OCR Post-processing Using Large Language Models" (2023). UNLV Theses, Dissertations, Professional Papers, and Capstones. 4811.
http://dx.doi.org/10.34917/36910880

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Digital Scholarship@UNLV

UNLV Theses, Dissertations, Professional Papers, and Capstones

OCR Post-processing Using Large Language Models

Award Date

Degree Type

Degree Name

Department

First Committee Member

Second Committee Member

Third Committee Member

Fourth Committee Member

Number of Pages

Abstract

Keywords

Disciplines

File Format

File Size

Degree Grantor

Language

Repository Citation

Rights

Included in

Browse

Digital Scholarship@UNLV

UNLV Theses, Dissertations, Professional Papers, and Capstones

OCR Post-processing Using Large Language Models

Author

Award Date

Degree Type

Degree Name

Department

First Committee Member

Second Committee Member

Third Committee Member

Fourth Committee Member

Number of Pages

Abstract

Keywords

Disciplines

File Format

File Size

Degree Grantor

Language

Repository Citation

Rights

Included in

Share

Browse