Award Date

5-2010

Degree Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science

First Committee Member

Kazem Taghva, Chair

Second Committee Member

Ajoy K. Datta

Third Committee Member

Laxmi P. Gewali

Graduate Faculty Representative

Muthukumar Venkatesan

Number of Pages

66

Abstract

Automated text categorization is a supervised learning task, defined as assigning category labels to new documents based on likelihood suggested by a training set of labeled documents. Two examples of methodology for text categorizations are Naive Bayes and K-Nearest Neighbor.

In this thesis, we implement two categorization engines based on Naive Bayes and K-Nearest Neighbor methodology. We then compare the effectiveness of these two engines by calculating standard precision and recall for a collection of documents. We will further report on time efficiency of these two engines.

Keywords

Automatic classification; Automatic indexing; Information Retrieval; Machine learning; Text processing (Computer science)

Disciplines

Computer Sciences | Databases and Information Systems | Library and Information Science

Language

English


Share

COinS