Post-Editing through Approximation and Global Correction
Document Type
Article
Publication Date
12-1995
Publication Title
International Journal of Pattern Recognition and Artificial Intelligence
Volume
9
Issue
6
First page number:
911
Last page number:
923
Abstract
This paper describes a new automatic spelling correction program to deal with OCR generated errors. The method used here is based on three principles: 1. Approximate string matching between the misspellings and the terms occurring in the database as opposed to the entire dictionary 2. Local information obtained from the individual documents 3. The use of a confusion matrix, which contains information inherently specific to the nature of errors caused by the particular OCR device This system is then utilized to process approximately 10,000 pages of OCR generated documents. Among the misspellings discovered by this algorithm, about 87% were corrected.
Keywords
Confusion matrix; Optical character recognition
Disciplines
Electrical and Computer Engineering | Engineering
Language
English
Permissions
Use Find in Your Library, contact the author, or interlibrary loan to garner a copy of the item. Publisher policy does not allow archiving the final published version. If a post-print (author's peer-reviewed manuscript) is allowed and available, or publisher policy changes, the item will be deposited.
Repository Citation
Taghva, K.,
Borsack, J.,
Bullard, B.,
Condit, A.
(1995).
Post-Editing through Approximation and Global Correction.
International Journal of Pattern Recognition and Artificial Intelligence, 9(6),
911-923.