Doctor of Philosophy (PhD)
First Committee Member
Thomas A. Nartker
Number of Pages
Given a bitmapped image of a page from any document, a page-reading system identifies the characters on the page and stores them in a text file. This "OCR-generated" text is represented by a string and compared with the correct string to determine the accuracy of this process. The string editing problem is applied to find an optimal correspondence of these strings using an appropriate cost function. The ISRI annual test of page-reading systems utilizes the following performance measures, which are defined in terms of this correspondence and the string edit distance: character accuracy, throughput, accuracy by character class, marked character efficiency, word accuracy, non-stopword accuracy, and phrase accuracy. It is shown that the universe of cost functions is divided into equivalence classes, and the cost functions related to the longest common subsequence (LCS) are identified. The computation of a LCS can be made faster by a linear-time preprocessing step.
Accuracy; Character; Character Recognition; Matchingpage; Measuring; Page; Reading; Recognition; String; String Matching; System; Character Recognition; String Matching
University of Nevada, Las Vegas
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to email@example.com and include clear identification of the work, preferably with URL.
Rice, Stephen Vincent, "Measuring the accuracy of page-reading systems" (1996). UNLV Retrospective Theses & Dissertations. 3014.