The Effects of Noisy Data on Text Retrieval

Document Type

Article

Publication Date

1-1994

Publication Title

Journal of the American Society for Information Science

Volume

45

Issue

1

First page number:

50

Last page number:

58

Abstract

We report on the results of our experiments on query evaluation in the presence of noisy data. In particular, an OCR-generated database and its corresponding 99.8% correct version are used to process a set of queries to determine the effect the degraded version will have on retrieval. It is shown that, with the set of scientific documents we use in our testing, the effect is insignificant. We further improve the result by applying an automatic postprocessing system designed to correct the kinds of errors generated by recognition devices.

Disciplines

Electrical and Computer Engineering | Engineering

Language

English

Permissions

We report on the results of our experiments on query evaluation in the presence of noisy data. In particular, an OCR-generated database and its corresponding 99.8% correct version are used to process a set of queries to determine the effect the degraded version will have on retrieval. It is shown that, with the set of scientific documents we use in our testing, the effect is insignificant. We further improve the result by applying an automatic postprocessing system designed to correct the kinds of errors generated by recognition devices.

UNLV article access

Search your library

Share

COinS