Award Date
1-1-1999
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
First Committee Member
Kazem Taghva
Number of Pages
55
Abstract
This thesis reports on the effects of an automatic query expansion with a subject specific thesaurus on retrieval effectiveness for document collection consisting of OCR text; The investigation encompasses several experiments with a modern retrieval engine based on the probabilistic model. Each experiment is performed on two document collections. The first version of the collection consists of raw OCR output. The second collection consists of the ground truth (retyped from hard copy) version of the same collection; It is shown that the usage of the thesaurus as a source for query expansion can significantly improve recall for Boolean queries, for both OCR and manually corrected document collections. In the case of weighted queries, the expansion has no effect on the average precision and recall. Nevertheless, some individual queries benefit from query expansion.
Keywords
Effectiveness; OCR; Retrieval; Text; Thesauri
Controlled Subject
Computer science; Information science
File Format
File Size
1576.96 KB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Permissions
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.
Repository Citation
Dimitrova, Elena, "Retrieval effectiveness for OCR text using thesauri" (1999). UNLV Retrospective Theses & Dissertations. 1097.
http://dx.doi.org/10.25669/34lc-b6dq
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/
COinS