Award Date


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Computer Science

First Committee Member

Thomas A. Nartker

Number of Pages



Given a bitmapped image of a page from any document, a page-reading system identifies the characters on the page and stores them in a text file. This "OCR-generated" text is represented by a string and compared with the correct string to determine the accuracy of this process. The string editing problem is applied to find an optimal correspondence of these strings using an appropriate cost function. The ISRI annual test of page-reading systems utilizes the following performance measures, which are defined in terms of this correspondence and the string edit distance: character accuracy, throughput, accuracy by character class, marked character efficiency, word accuracy, non-stopword accuracy, and phrase accuracy. It is shown that the universe of cost functions is divided into equivalence classes, and the cost functions related to the longest common subsequence (LCS) are identified. The computation of a LCS can be made faster by a linear-time preprocessing step.


Accuracy; Character; Character Recognition; Matchingpage; Measuring; Page; Reading; Recognition; String; String Matching; System; Character Recognition; String Matching

Controlled Subject

Computer science

File Format


File Size

1751.04 KB

Degree Grantor

University of Nevada, Las Vegas




If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to and include clear identification of the work, preferably with URL.


IN COPYRIGHT. For more information about this rights statement, please visit