Master of Science (MS)
Number of Pages
Systems that predict optical character recognition (OCR) accuracy of an input image by a given OCR system were developed. Seven features associated with image defects were identified and utilized. Two kinds of nonparametric classification engines, the nearest neighbor rule-based and neural network-based, were implemented. The performance of these systems were compared to an old heuristic-based system using a cost model of a large-scale document conversion process and a test data set consisting of 502 pages. The results show that the performance of new classifiers were better than that of the heuristic-based system. The neural network-based system outperformed the nearest-neighbor-based system. These new systems can be used to reduce the cost of a large-scale document conversion process by discriminating good quality pages for OCR from degraded images for manual data entry.
Accuracy; OCR; Predictor; Statistical; Techniques
Computer science; Artificial intelligence
University of Nevada, Las Vegas
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to firstname.lastname@example.org and include clear identification of the work, preferably with URL.
Gonzalez, Juan Manuel, "Predictor of OCR accuracy using statistical techniques" (1996). UNLV Retrospective Theses & Dissertations. 589.