The Effects of Noisy Data on Text Retrieval
Document Type
Article
Publication Date
1-1994
Publication Title
Journal of the American Society for Information Science
Volume
45
Issue
1
First page number:
50
Last page number:
58
Abstract
We report on the results of our experiments on query evaluation in the presence of noisy data. In particular, an OCR-generated database and its corresponding 99.8% correct version are used to process a set of queries to determine the effect the degraded version will have on retrieval. It is shown that, with the set of scientific documents we use in our testing, the effect is insignificant. We further improve the result by applying an automatic postprocessing system designed to correct the kinds of errors generated by recognition devices.
Disciplines
Electrical and Computer Engineering | Engineering
Language
English
Permissions
We report on the results of our experiments on query evaluation in the presence of noisy data. In particular, an OCR-generated database and its corresponding 99.8% correct version are used to process a set of queries to determine the effect the degraded version will have on retrieval. It is shown that, with the set of scientific documents we use in our testing, the effect is insignificant. We further improve the result by applying an automatic postprocessing system designed to correct the kinds of errors generated by recognition devices.
Repository Citation
Taghva, K.,
Borsack, J.,
Condit, A.,
Erva, S.
(1994).
The Effects of Noisy Data on Text Retrieval.
Journal of the American Society for Information Science, 45(1),
50-58.