Computer Science Faculty Research

OCR Post Processing Using Support Vector Machines

Jorge Ramon Fonseca Cacho, University of Nevada, Las VegasFollow
Kazem Taghva, University of Nevada, Las VegasFollow

Document Type

Conference Proceeding

Publication Date

7-4-2020

Publication Title

Science and Information Conference

Publisher

Springer

Publisher Location

London, United Kingdom

Volume

1229

First page number:

694

Last page number:

713

Abstract

In this paper, we introduce a set of detailed experiment using Support Vector Machines (SVM) to try and improve accuracy selecting the correct candidate word to correct OCR generated errors. We use our alignment algorithm to create a one-to-one correspondence between the OCR text and the clean version of the TREC-5 data set (Confusion Track). We then extract five features from the candidates suggested by the Google web 1T corpus and use them to train and test our SVM model that will then generalize into the rest of the unseen text. We then improve on our initial results using a polynomial kernel, feature standardization with minmax normalization, and class balancing with SMOTE. Finally, we analyze the errors and suggest on future improvements.

Keywords

OCR; Support vector machines; SVM; OCR post processing; SMOTE

Disciplines

Computational Engineering | Systems and Communications

Language

English

Repository Citation

Fonseca Cacho, J. R., Taghva, K. (2020). OCR Post Processing Using Support Vector Machines. Science and Information Conference, 1229 694-713. London, United Kingdom: Springer.
http://dx.doi.org/10.1007/978-3-030-52246-9_51

UNLV article access

COinS

Digital Scholarship@UNLV

Computer Science Faculty Research

OCR Post Processing Using Support Vector Machines

Document Type

Publication Date

Publication Title

Publisher

Publisher Location

Volume

First page number:

Last page number:

Abstract

Keywords

Disciplines

Language

Repository Citation

Browse

Links

Digital Scholarship@UNLV

Computer Science Faculty Research

OCR Post Processing Using Support Vector Machines

Authors

Document Type

Publication Date

Publication Title

Publisher

Publisher Location

Volume

First page number:

Last page number:

Abstract

Keywords

Disciplines

Language

Repository Citation

Share

Browse

Links