Award Date
1-1-1994
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
First Committee Member
Kazem Taghva
Number of Pages
27
Abstract
Presented in this thesis is a study of the effect of OCR errors on short documents. OCR recognizes and translates text image into ASCII format. When this data is retrieved in response to a query, the retrieval performance depends on the efficiency of the OCR device used. Measures like recall, precision and ranking were used to gauge the retrieval performance. The information retrieval system that was used is SMART, based on the vector space model. On evaluating these measures, it has been concluded that average precision and recall are not affected significantly when the OCR collection is compared to its corrected version. However, it was also concluded that with more complex weighting schemes, the relevant document rankings became more divergent. Also, the effect of an automatic post-processing system on the retrieval performance was studied.
Keywords
Documents; Effect; Errors; OCR; Short
Controlled Subject
Computer science
File Format
File Size
1351.68 KB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Permissions
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.
Repository Citation
Inaparthy, Padma, "Effect of OCR errors on short documents" (1994). UNLV Retrospective Theses & Dissertations. 455.
http://dx.doi.org/10.25669/2gel-89vo
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/
COinS