Award Date

1-1-1999

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Committee Member

Kazem Taghva

Number of Pages

55

Abstract

This thesis reports on the effects of an automatic query expansion with a subject specific thesaurus on retrieval effectiveness for document collection consisting of OCR text; The investigation encompasses several experiments with a modern retrieval engine based on the probabilistic model. Each experiment is performed on two document collections. The first version of the collection consists of raw OCR output. The second collection consists of the ground truth (retyped from hard copy) version of the same collection; It is shown that the usage of the thesaurus as a source for query expansion can significantly improve recall for Boolean queries, for both OCR and manually corrected document collections. In the case of weighted queries, the expansion has no effect on the average precision and recall. Nevertheless, some individual queries benefit from query expansion.

Keywords

Effectiveness; OCR; Retrieval; Text; Thesauri

Controlled Subject

Computer science; Information science

File Format

pdf

File Size

1576.96 KB

Degree Grantor

University of Nevada, Las Vegas

Language

English

Permissions

If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.

Identifier

https://doi.org/10.25669/34lc-b6dq


Share

COinS