Award Date

1-1-2003

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Committee Member

Kazem Tagva

Number of Pages

Abstract

The question posed in this thesis is whether the effectiveness of the rule-based approach to automatic text categorization on OCR collections can be improved by using domain-specific thesauri. A rule-based categorizer was constructed consisting of a C++ program called C-KANT which consults documents and creates a program which can be executed by the CLIPS expert system shell. A series of tests using domain-specific thesauri revealed that a query expansion approach to rule-based automatic text categorization using domain-dependent thesauri will not improve the categorization of OCR texts. Although some improvement to categorization could be made using rules over a mixture of thesauri, the improvements were not significantly large.

Keywords

Aided; Based; Categorization; Learning; OCR; Rule; Texts; Thesaurus

Controlled Subject

Computer science; Artificial intelligence

File Format

pdf

File Size

1771.52 KB

Degree Grantor

University of Nevada, Las Vegas

Language

English

Permissions

If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.

Repository Citation

Coombs, Jeffrey Scott, "Thesaurus-aided learning for rule-based categorization of Ocr texts" (2003). UNLV Retrospective Theses & Dissertations. 1614.
http://dx.doi.org/10.25669/2lfp-mwgp

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/

Download

COinS

Digital Scholarship@UNLV

UNLV Retrospective Theses & Dissertations

Thesaurus-aided learning for rule-based categorization of Ocr texts

Author