Award Date
8-1-2012
Degree Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science
First Committee Member
Kazem Taghva
Second Committee Member
Ajoy K. Datta
Third Committee Member
Laxmi P. Gewali
Fourth Committee Member
Venkatesan Muthukumar
Number of Pages
49
Abstract
In this thesis, we describe a postprocessing system on Optical Character Recognition(OCR) generated text. Second Order Hidden Markov Model (HMM) approach is used to detect and correct the OCR related errors. The reason for choosing the 2nd order HMM is to keep track of the bigrams so that the model can represent the system more accurately. Based on experiments with training data of 159,733 characters and testing of 5,688 characters, the model was able to correct 43.38 % of the errors with a precision of 75.34 %. However, the precision value indicates that the model
introduced some new errors, decreasing the correction percentage to 26.4%.
Keywords
Errors; Hidden Markov Models; Optical character recognition; Second Order HMM
Disciplines
Computer Sciences
File Format
Degree Grantor
University of Nevada, Las Vegas
Language
English
Repository Citation
Poudel, Srijana, "Post Processing of Optically Recognized Text via Second Order Hidden Markov Model" (2012). UNLV Theses, Dissertations, Professional Papers, and Capstones. 1694.
http://dx.doi.org/10.34917/4332675
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/