Title
The Effects of Noisy Data on Text Retrieval
Document Type
Article
Abstract
We report on the results of our experiments on query evaluation in the presence of noisy data. In particular, an OCR-generated database and its corresponding 99.8% correct version are used to process a set of queries to determine the effect the degraded version will have on retrieval. It is shown that, with the set of scientific documents we use in our testing, the effect is insignificant. We further improve the result by applying an automatic postprocessing system designed to correct the kinds of errors generated by recognition devices.
Disciplines
Electrical and Computer Engineering | Engineering
Permissions
We report on the results of our experiments on query evaluation in the presence of noisy data. In particular, an OCR-generated database and its corresponding 99.8% correct version are used to process a set of queries to determine the effect the degraded version will have on retrieval. It is shown that, with the set of scientific documents we use in our testing, the effect is insignificant. We further improve the result by applying an automatic postprocessing system designed to correct the kinds of errors generated by recognition devices.
Citation Information
Taghva, K.,
Borsack, J.,
Condit, A.,
Erva, S.
(1994).
The Effects of Noisy Data on Text Retrieval.
Journal of the American Society for Information Science, 45(1),
50-58.