Award Date
1-1-1996
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computer Science
First Committee Member
Thomas A. Nartker
Number of Pages
81
Abstract
Given a bitmapped image of a page from any document, a page-reading system identifies the characters on the page and stores them in a text file. This "OCR-generated" text is represented by a string and compared with the correct string to determine the accuracy of this process. The string editing problem is applied to find an optimal correspondence of these strings using an appropriate cost function. The ISRI annual test of page-reading systems utilizes the following performance measures, which are defined in terms of this correspondence and the string edit distance: character accuracy, throughput, accuracy by character class, marked character efficiency, word accuracy, non-stopword accuracy, and phrase accuracy. It is shown that the universe of cost functions is divided into equivalence classes, and the cost functions related to the longest common subsequence (LCS) are identified. The computation of a LCS can be made faster by a linear-time preprocessing step.
Keywords
Accuracy; Character; Character Recognition; Matchingpage; Measuring; Page; Reading; Recognition; String; String Matching; System; Character Recognition; String Matching
Controlled Subject
Computer science
File Format
File Size
1751.04 KB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Permissions
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.
Repository Citation
Rice, Stephen Vincent, "Measuring the accuracy of page-reading systems" (1996). UNLV Retrospective Theses & Dissertations. 3014.
http://dx.doi.org/10.25669/hfa8-0cqv
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/
COinS