Effects of OCR Errors on Ranking and Feedback Using the Vector Space Model
Document Type
Article
Publication Date
5-1996
Publication Title
Information Processing & Management
Volume
32
Issue
3
First page number:
317
Last page number:
327
Abstract
We report on the performance of the vector space model in the presence of OCR errors. We show that average precision and recall is not affected for our full text document collection when the OCR version is compared to its corresponding corrected set. We do see divergence though between the relevant document rankings of the OCR and corrected collections with different weighting combinations. In particular, we observed that cosine normalization plays a considerable role in the disparity seen between the collections. Furthermore, we show that even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents.
Disciplines
Electrical and Computer Engineering | Engineering
Language
English
Permissions
Use Find in Your Library, contact the author, or interlibrary loan to garner a copy of the item. Publisher policy does not allow archiving the final published version. If a post-print (author's peer-reviewed manuscript) is allowed and available, or publisher policy changes, the item will be deposited.
Repository Citation
Taghva, K.,
Borsack, J.,
Condit, A.
(1996).
Effects of OCR Errors on Ranking and Feedback Using the Vector Space Model.
Information Processing & Management, 32(3),
317-327.