Award Date

8-1-2012

Degree Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science

First Committee Member

Kazem Taghva

Second Committee Member

Ajoy K. Datta

Third Committee Member

Laxmi P. Gewali

Fourth Committee Member

Venkatesan Muthukumar

Number of Pages

Abstract

In this thesis, we describe a postprocessing system on Optical Character Recognition(OCR) generated text. Second Order Hidden Markov Model (HMM) approach is used to detect and correct the OCR related errors. The reason for choosing the 2nd order HMM is to keep track of the bigrams so that the model can represent the system more accurately. Based on experiments with training data of 159,733 characters and testing of 5,688 characters, the model was able to correct 43.38 % of the errors with a precision of 75.34 %. However, the precision value indicates that the model

introduced some new errors, decreasing the correction percentage to 26.4%.

Keywords

Errors; Hidden Markov Models; Optical character recognition; Second Order HMM

Disciplines

Computer Sciences

File Format

pdf

Degree Grantor

University of Nevada, Las Vegas

Language

English

Repository Citation

Poudel, Srijana, "Post Processing of Optically Recognized Text via Second Order Hidden Markov Model" (2012). UNLV Theses, Dissertations, Professional Papers, and Capstones. 1694.
http://dx.doi.org/10.34917/4332675

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/

Download

Included in

Computer Sciences Commons

COinS

Digital Scholarship@UNLV

UNLV Theses, Dissertations, Professional Papers, and Capstones

Post Processing of Optically Recognized Text via Second Order Hidden Markov Model

Award Date

Degree Type

Degree Name

Department

First Committee Member

Second Committee Member

Third Committee Member

Fourth Committee Member

Number of Pages

Abstract

Keywords

Disciplines

File Format

Degree Grantor

Language

Repository Citation

Rights

Included in

Browse

Digital Scholarship@UNLV

UNLV Theses, Dissertations, Professional Papers, and Capstones

Post Processing of Optically Recognized Text via Second Order Hidden Markov Model

Author

Award Date

Degree Type

Degree Name

Department

First Committee Member

Second Committee Member

Third Committee Member

Fourth Committee Member

Number of Pages

Abstract

Keywords

Disciplines

File Format

Degree Grantor

Language

Repository Citation

Rights

Included in

Share

Browse