Master of Science (MS)
First Committee Member
Number of Pages
This thesis reports on the effects of an automatic query expansion with a subject specific thesaurus on retrieval effectiveness for document collection consisting of OCR text; The investigation encompasses several experiments with a modern retrieval engine based on the probabilistic model. Each experiment is performed on two document collections. The first version of the collection consists of raw OCR output. The second collection consists of the ground truth (retyped from hard copy) version of the same collection; It is shown that the usage of the thesaurus as a source for query expansion can significantly improve recall for Boolean queries, for both OCR and manually corrected document collections. In the case of weighted queries, the expansion has no effect on the average precision and recall. Nevertheless, some individual queries benefit from query expansion.
Effectiveness; OCR; Retrieval; Text; Thesauri
Computer science; Information science
University of Nevada, Las Vegas
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to firstname.lastname@example.org and include clear identification of the work, preferably with URL.
Dimitrova, Elena, "Retrieval effectiveness for OCR text using thesauri" (1999). UNLV Retrospective Theses & Dissertations. 1097.