Award Date
5-2010
Degree Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science
First Committee Member
Kazem Taghva, Chair
Second Committee Member
Ajoy K. Datta
Third Committee Member
Laxmi P. Gewali
Graduate Faculty Representative
Muthukumar Venkatesan
Number of Pages
66
Abstract
Automated text categorization is a supervised learning task, defined as assigning category labels to new documents based on likelihood suggested by a training set of labeled documents. Two examples of methodology for text categorizations are Naive Bayes and K-Nearest Neighbor.
In this thesis, we implement two categorization engines based on Naive Bayes and K-Nearest Neighbor methodology. We then compare the effectiveness of these two engines by calculating standard precision and recall for a collection of documents. We will further report on time efficiency of these two engines.
Keywords
Automatic classification; Automatic indexing; Information Retrieval; Machine learning; Text processing (Computer science)
Disciplines
Computer Sciences | Databases and Information Systems | Library and Information Science
File Format
Degree Grantor
University of Nevada, Las Vegas
Language
English
Repository Citation
Karamcheti, Aditya Chainulu, "A Comparative study on text categorization" (2010). UNLV Theses, Dissertations, Professional Papers, and Capstones. 322.
http://dx.doi.org/10.34917/1563704
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/
COinS