Title

The Effects of Noisy Data on Text Retrieval

Document Type

Article

Abstract

We report on the results of our experiments on query evaluation in the presence of noisy data. In particular, an OCR-generated database and its corresponding 99.8% correct version are used to process a set of queries to determine the effect the degraded version will have on retrieval. It is shown that, with the set of scientific documents we use in our testing, the effect is insignificant. We further improve the result by applying an automatic postprocessing system designed to correct the kinds of errors generated by recognition devices.

Disciplines

Electrical and Computer Engineering | Engineering

Permissions

We report on the results of our experiments on query evaluation in the presence of noisy data. In particular, an OCR-generated database and its corresponding 99.8% correct version are used to process a set of queries to determine the effect the degraded version will have on retrieval. It is shown that, with the set of scientific documents we use in our testing, the effect is insignificant. We further improve the result by applying an automatic postprocessing system designed to correct the kinds of errors generated by recognition devices.