Electrical & Computer Engineering Faculty Research

Evaluation of Model-Based Retrieval Effectiveness with OCR Text

Kazem Taghva, University of Nevada, Las VegasFollow
Julie Borsack, University of Nevada, Las VegasFollow
Allen Condit, University of Nevada, Las VegasFollow

Document Type

Article

Publication Date

1-1996

Publication Title

ACM Transactions on Information Systems

Volume

Issue

First page number:

Last page number:

Abstract

We give a comprehensive report on our experiments with retrieval from OCR-generated text using systems based on standard models of retrieval. More specifically, we show that average precision and recall is not affected by OCR errors across systems for several collections. The collections used in these experiments include both actual OCR-generated text and standard information retrieval collections corrupted through the simulation of OCR errors. Both the actual and simulation experiments include full-text and abstract-length documents. We also demonstrate that the ranking and feedback methods associated with these models are generally not robust enough to deal with OCR errors. It is further shown that the OCR errors and garbage strings generated from the mistranslation of graphic objects increase the size of the index by a wide margin. We not only point out problems that can arise from applying OCR text within an information retrieval environment, we also suggest solutions to overcome some of these problems.

Keywords

Error correction; Feedback; Optical character recognition; Ranking algorithms

Disciplines

Electrical and Computer Engineering | Engineering

Language

English

Permissions

Use Find in Your Library, contact the author, or interlibrary loan to garner a copy of the item. Publisher policy does not allow archiving the final published version. If a post-print (author's peer-reviewed manuscript) is allowed and available, or publisher policy changes, the item will be deposited.

Repository Citation

Taghva, K., Borsack, J., Condit, A. (1996). Evaluation of Model-Based Retrieval Effectiveness with OCR Text. ACM Transactions on Information Systems, 14(1), 64-93.

UNLV article access

Search your library

Find in your library

COinS

Digital Scholarship@UNLV

Electrical & Computer Engineering Faculty Research

Evaluation of Model-Based Retrieval Effectiveness with OCR Text

Document Type

Publication Date

Publication Title

Volume

Issue

First page number:

Last page number:

Abstract

Keywords

Disciplines

Language

Permissions

Repository Citation

Browse

Links

Digital Scholarship@UNLV

Electrical & Computer Engineering Faculty Research

Evaluation of Model-Based Retrieval Effectiveness with OCR Text

Authors

Document Type

Publication Date

Publication Title

Volume

Issue

First page number:

Last page number:

Abstract

Keywords

Disciplines

Language

Permissions

Repository Citation

Share

Browse

Links