Award Date
1-1-2003
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
First Committee Member
Kazem Tagva
Number of Pages
66
Abstract
The question posed in this thesis is whether the effectiveness of the rule-based approach to automatic text categorization on OCR collections can be improved by using domain-specific thesauri. A rule-based categorizer was constructed consisting of a C++ program called C-KANT which consults documents and creates a program which can be executed by the CLIPS expert system shell. A series of tests using domain-specific thesauri revealed that a query expansion approach to rule-based automatic text categorization using domain-dependent thesauri will not improve the categorization of OCR texts. Although some improvement to categorization could be made using rules over a mixture of thesauri, the improvements were not significantly large.
Keywords
Aided; Based; Categorization; Learning; OCR; Rule; Texts; Thesaurus
Controlled Subject
Computer science; Artificial intelligence
File Format
File Size
1771.52 KB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Permissions
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.
Repository Citation
Coombs, Jeffrey Scott, "Thesaurus-aided learning for rule-based categorization of Ocr texts" (2003). UNLV Retrospective Theses & Dissertations. 1614.
http://dx.doi.org/10.25669/2lfp-mwgp
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/
COinS