Award Date

1-1-1996

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

Number of Pages

Abstract

Systems that predict optical character recognition (OCR) accuracy of an input image by a given OCR system were developed. Seven features associated with image defects were identified and utilized. Two kinds of nonparametric classification engines, the nearest neighbor rule-based and neural network-based, were implemented. The performance of these systems were compared to an old heuristic-based system using a cost model of a large-scale document conversion process and a test data set consisting of 502 pages. The results show that the performance of new classifiers were better than that of the heuristic-based system. The neural network-based system outperformed the nearest-neighbor-based system. These new systems can be used to reduce the cost of a large-scale document conversion process by discriminating good quality pages for OCR from degraded images for manual data entry.

Keywords

Accuracy; OCR; Predictor; Statistical; Techniques

Controlled Subject

Computer science; Artificial intelligence

File Format

pdf

File Size

2314.24 KB

Degree Grantor

University of Nevada, Las Vegas

Language

English

Permissions

If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.

Repository Citation

Gonzalez, Juan Manuel, "Predictor of OCR accuracy using statistical techniques" (1996). UNLV Retrospective Theses & Dissertations. 589.
http://dx.doi.org/10.25669/3twf-9oel

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/

Download

COinS

Digital Scholarship@UNLV

UNLV Retrospective Theses & Dissertations

Predictor of OCR accuracy using statistical techniques

Author