Master of Science (MS)
First Committee Member
Number of Pages
Presented in this thesis is a study of the effect of OCR errors on short documents. OCR recognizes and translates text image into ASCII format. When this data is retrieved in response to a query, the retrieval performance depends on the efficiency of the OCR device used. Measures like recall, precision and ranking were used to gauge the retrieval performance. The information retrieval system that was used is SMART, based on the vector space model. On evaluating these measures, it has been concluded that average precision and recall are not affected significantly when the OCR collection is compared to its corrected version. However, it was also concluded that with more complex weighting schemes, the relevant document rankings became more divergent. Also, the effect of an automatic post-processing system on the retrieval performance was studied.
Documents; Effect; Errors; OCR; Short
University of Nevada, Las Vegas
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to firstname.lastname@example.org and include clear identification of the work, preferably with URL.
Inaparthy, Padma, "Effect of OCR errors on short documents" (1994). UNLV Retrospective Theses & Dissertations. 455.
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/