Electrical & Computer Engineering Faculty Research

Effects of OCR Errors on Ranking and Feedback Using the Vector Space Model

Kazem Taghva, University of Nevada, Las VegasFollow
Julie Borsack, University of Nevada, Las VegasFollow
Allen Condit, University of Nevada, Las VegasFollow

Document Type

Article

Publication Date

5-1996

Publication Title

Information Processing & Management

Volume

Issue

First page number:

317

Last page number:

327

Abstract

We report on the performance of the vector space model in the presence of OCR errors. We show that average precision and recall is not affected for our full text document collection when the OCR version is compared to its corresponding corrected set. We do see divergence though between the relevant document rankings of the OCR and corrected collections with different weighting combinations. In particular, we observed that cosine normalization plays a considerable role in the disparity seen between the collections. Furthermore, we show that even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents.

Disciplines

Electrical and Computer Engineering | Engineering

Language

English

Permissions

Use Find in Your Library, contact the author, or interlibrary loan to garner a copy of the item. Publisher policy does not allow archiving the final published version. If a post-print (author's peer-reviewed manuscript) is allowed and available, or publisher policy changes, the item will be deposited.

Repository Citation

Taghva, K., Borsack, J., Condit, A. (1996). Effects of OCR Errors on Ranking and Feedback Using the Vector Space Model. Information Processing & Management, 32(3), 317-327.

UNLV article access

Search your library

Find in your library

COinS

Digital Scholarship@UNLV

Electrical & Computer Engineering Faculty Research

Effects of OCR Errors on Ranking and Feedback Using the Vector Space Model

Document Type

Publication Date

Publication Title

Volume

Issue

First page number:

Last page number:

Abstract

Disciplines

Language

Permissions

Repository Citation

Browse

Links

Digital Scholarship@UNLV

Electrical & Computer Engineering Faculty Research

Effects of OCR Errors on Ranking and Feedback Using the Vector Space Model

Authors

Document Type

Publication Date

Publication Title

Volume

Issue

First page number:

Last page number:

Abstract

Disciplines

Language

Permissions

Repository Citation

Share

Browse

Links