Award Date

5-2009

Degree Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science

First Committee Member

Kazem Taghva, Chair

Second Committee Member

Ajoy K. Datta

Third Committee Member

Laxmi P. Gewali

Graduate Faculty Representative

Muthukumar Venkatesan

Number of Pages

71

Abstract

Automatic Text categorization is the task of assigning an electronic document to one or more categories, based on its contents. There are many known techniques to efficiently solve categorization problems. Typically these techniques fall into two distinct methodologies which are either logic based or probabilistic. In recent years, many researchers have tried approaches which area hybrid of these two methodologies.

In this thesis, we deal with document categorization using Apriori Algorithm. The Apriori algorithm was initially developed for data mining and basket analysis applications in the relational databases. Although the technique is logic based, it also relies on the statistical characteristics of the data. As a part of this work, we will implement all the tools which are necessary to carry out automatic categorization using Apriori algorithm. We will also report on the categorization effectiveness by applying this technique to standard collections.

Keywords

Bayesian statistical decision theory; Computer algorithms; Data mining; Machine learning

Disciplines

Computer Engineering | Systems and Communications

Language

English

Comments

Signatures have been redacted for privacy and security measures.


Share

COinS