The Effects of OCR Error on the Extraction of Private Information
Editors
Horst Bunke; A. Lawrence Spitz
Document Type
Chapter
Publication Date
2006
Publication Title
Document Analysis Systems VII
Publisher
Springer Berlin Heidelberg
First page number:
348
Last page number:
357
Abstract
OCR error has been shown not to affect the average accuracy of text retrieval or text categorization. Recent studies however have indicated that information extraction is significantly degraded by OCR error. We experimented with information extraction software on two collections, one with OCR-ed documents and another with manually-corrected versions of the former. We discovered a significant reduction in accuracy on the OCR text versus the corrected text. The majority of errors were attributable to zoning problems rather than OCR classification errors.
Disciplines
Electrical and Computer Engineering | Engineering
Language
English
Permissions
Use Find in Your Library, contact the author, or interlibrary loan to garner a copy of the item. Publisher policy does not allow archiving the final published version. If a post-print (author's peer-reviewed manuscript) is allowed and available, or publisher policy changes, the item will be deposited.
Repository Citation
Taghva, K.,
Beckley, R.,
Coombs, J.
(2006).
The Effects of OCR Error on the Extraction of Private Information. In Horst Bunke; A. Lawrence Spitz,
Document Analysis Systems VII
348-357.
Springer Berlin Heidelberg.