Award Date

1-1-1996

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Committee Member

Thomas A. Nartker

Number of Pages

81

Abstract

Given a bitmapped image of a page from any document, a page-reading system identifies the characters on the page and stores them in a text file. This "OCR-generated" text is represented by a string and compared with the correct string to determine the accuracy of this process. The string editing problem is applied to find an optimal correspondence of these strings using an appropriate cost function. The ISRI annual test of page-reading systems utilizes the following performance measures, which are defined in terms of this correspondence and the string edit distance: character accuracy, throughput, accuracy by character class, marked character efficiency, word accuracy, non-stopword accuracy, and phrase accuracy. It is shown that the universe of cost functions is divided into equivalence classes, and the cost functions related to the longest common subsequence (LCS) are identified. The computation of a LCS can be made faster by a linear-time preprocessing step.

Keywords

Accuracy; Character; Character Recognition; Matchingpage; Measuring; Page; Reading; Recognition; String; String Matching; System; Character Recognition; String Matching

Controlled Subject

Computer science

File Format

pdf

File Size

1751.04 KB

Degree Grantor

University of Nevada, Las Vegas

Language

English

Permissions

If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.

Identifier

https://doi.org/10.25669/hfa8-0cqv


Share

COinS