Electrical & Computer Engineering Faculty Research

The Eﬀects of OCR Error on the Extraction of Private Information

Kazem Taghva, University of Nevada, Las VegasFollow
Russell Beckley, University of Nevada, Las Vegas
Jeffrey Coombs, University of Nevada, Las VegasFollow

Editors

Horst Bunke; A. Lawrence Spitz

Document Type

Chapter

Publication Date

2006

Publication Title

Document Analysis Systems VII

Publisher

Springer Berlin Heidelberg

First page number:

348

Last page number:

357

Abstract

OCR error has been shown not to affect the average accuracy of text retrieval or text categorization. Recent studies however have indicated that information extraction is significantly degraded by OCR error. We experimented with information extraction software on two collections, one with OCR-ed documents and another with manually-corrected versions of the former. We discovered a significant reduction in accuracy on the OCR text versus the corrected text. The majority of errors were attributable to zoning problems rather than OCR classification errors.

Disciplines

Electrical and Computer Engineering | Engineering

Language

English

Permissions

Use Find in Your Library, contact the author, or interlibrary loan to garner a copy of the item. Publisher policy does not allow archiving the final published version. If a post-print (author's peer-reviewed manuscript) is allowed and available, or publisher policy changes, the item will be deposited.

Repository Citation

Taghva, K., Beckley, R., Coombs, J. (2006). The Eﬀects of OCR Error on the Extraction of Private Information. In Horst Bunke; A. Lawrence Spitz, Document Analysis Systems VII 348-357. Springer Berlin Heidelberg.

UNLV article access

COinS

Digital Scholarship@UNLV

Electrical & Computer Engineering Faculty Research

The Eﬀects of OCR Error on the Extraction of Private Information

Editors

Document Type

Publication Date

Publication Title

Publisher

First page number:

Last page number:

Abstract

Disciplines

Language

Permissions

Repository Citation

Browse

Links

Digital Scholarship@UNLV

Electrical & Computer Engineering Faculty Research

The Eﬀects of OCR Error on the Extraction of Private Information

Authors

Editors

Document Type

Publication Date

Publication Title

Publisher

First page number:

Last page number:

Abstract

Disciplines

Language

Permissions

Repository Citation

Share

Browse

Links