Master of Science in Computer Science
First Committee Member
Second Committee Member
Ajoy K. Datta
Third Committee Member
Laxmi P. Gewali
Fourth Committee Member
Number of Pages
In this thesis, we describe a postprocessing system on Optical Character Recognition(OCR) generated text. Second Order Hidden Markov Model (HMM) approach is used to detect and correct the OCR related errors. The reason for choosing the 2nd order HMM is to keep track of the bigrams so that the model can represent the system more accurately. Based on experiments with training data of 159,733 characters and testing of 5,688 characters, the model was able to correct 43.38 % of the errors with a precision of 75.34 %. However, the precision value indicates that the model
introduced some new errors, decreasing the correction percentage to 26.4%.
Errors; Hidden Markov Models; Optical character recognition; Second Order HMM
Poudel, Srijana, "Post Processing of Optically Recognized Text via Second Order Hidden Markov Model" (2012). UNLV Theses, Dissertations, Professional Papers, and Capstones. 1694.