"Results of Applying Probabilistic IR to OCR Text" by Kazem Taghva, Julie Borsack et al.

Electrical and Computer Engineering Faculty Presentations

Title

Results of Applying Probabilistic IR to OCR Text

Authors

Kazem Taghva, University of Nevada, Las VegasFollow
Julie Borsack, University of Nevada, Las VegasFollow
Allen Condit, University of Nevada, Las VegasFollow

Meeting name

Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval

Document Type

Conference Proceeding

Publication Date

1994

Abstract

Character accuracy of optically recognized text is considered a basic measure for evaluating OCR devices. In the broader sense, another fundamental measure of an OCR’s goodness is whether its generated text is usable for retrieving information. In this study, we evaluate retrieval effectiveness from OCR text databases using a probabilistic IR system. We compare these retrieval results to their manually corrected equivalent. We show there is no statistical difference in precision and recall using graded accuracy levels from three OCR devices. However, characteristics of the OCR data have side effects that could cause unstable results with this IR model. In particular, we found individual queries can be greatly affected. Knowing the qualities of OCR text, we compensate for them by applying an automatic post-processing system that improves effectiveness.

Keywords

Information retrieval; Optical character recognition; Optical character recognition devices – Evaluation; Optical pattern recognition

Disciplines

Permissions

Use Find in Your Library, contact the author, or interlibrary loan to garner a copy of the item. Publisher policy does not allow archiving the final published version. If a post-print (author's peer-reviewed manuscript) is allowed and available, or publisher policy changes, the item will be deposited.

Repository Citation

Taghva, K., Borsack, J., Condit, A. (1994, January). Results of Applying Probabilistic IR to OCR Text. Presentation at Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval,

Available at: https://digitalscholarship.unlv.edu/ece_presentations/36

UNLV article access

COinS

Digital Scholarship@UNLV

Electrical and Computer Engineering Faculty Presentations

Title

Authors

Meeting name

Document Type

Publication Date

Abstract

Keywords

Disciplines

Permissions

Repository Citation

Browse

Links

Digital Scholarship@UNLV

Electrical and Computer Engineering Faculty Presentations

Title

Authors

Meeting name

Document Type

Publication Date

Abstract

Keywords

Disciplines

Permissions

Repository Citation

Share

Browse

Links