Award Date
1-1-1996
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
Number of Pages
70
Abstract
Systems that predict optical character recognition (OCR) accuracy of an input image by a given OCR system were developed. Seven features associated with image defects were identified and utilized. Two kinds of nonparametric classification engines, the nearest neighbor rule-based and neural network-based, were implemented. The performance of these systems were compared to an old heuristic-based system using a cost model of a large-scale document conversion process and a test data set consisting of 502 pages. The results show that the performance of new classifiers were better than that of the heuristic-based system. The neural network-based system outperformed the nearest-neighbor-based system. These new systems can be used to reduce the cost of a large-scale document conversion process by discriminating good quality pages for OCR from degraded images for manual data entry.
Keywords
Accuracy; OCR; Predictor; Statistical; Techniques
Controlled Subject
Computer science; Artificial intelligence
File Format
File Size
2314.24 KB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Permissions
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.
Repository Citation
Gonzalez, Juan Manuel, "Predictor of OCR accuracy using statistical techniques" (1996). UNLV Retrospective Theses & Dissertations. 589.
http://dx.doi.org/10.25669/3twf-9oel
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/