Master of Science (MS)
First Committee Member
Number of Pages
The question posed in this thesis is whether the effectiveness of the rule-based approach to automatic text categorization on OCR collections can be improved by using domain-specific thesauri. A rule-based categorizer was constructed consisting of a C++ program called C-KANT which consults documents and creates a program which can be executed by the CLIPS expert system shell. A series of tests using domain-specific thesauri revealed that a query expansion approach to rule-based automatic text categorization using domain-dependent thesauri will not improve the categorization of OCR texts. Although some improvement to categorization could be made using rules over a mixture of thesauri, the improvements were not significantly large.
Aided; Based; Categorization; Learning; OCR; Rule; Texts; Thesaurus
Computer science; Artificial intelligence
University of Nevada, Las Vegas
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to firstname.lastname@example.org and include clear identification of the work, preferably with URL.
Coombs, Jeffrey Scott, "Thesaurus-aided learning for rule-based categorization of Ocr texts" (2003). UNLV Retrospective Theses & Dissertations. 1614.
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/