Master of Science in Computer Science
First Committee Member
Kazem Taghva, Chair
Second Committee Member
Third Committee Member
Graduate Faculty Representative
Number of Pages
This thesis will discuss feature selection algorithms for text-categorization. Feature selection algorithms are very important, as they can make-or-break a categorization engine. The feature selection algorithms that will be discussed in this thesis are Document Frequency, Information Gain, Chi Squared, Mutual Information, NGL (Ng-Goh-Low) coefficient, and GSS (Galavotti-Sebastiani-Simi) coefficient . The general idea of any feature selection algorithm is to determine importance of words using some measure that can keep informative words, and remove non-informative words, which can then help the text-categorization engine categorize a document, D , into some category, C . These feature selection methods are explained, implemented, and are provided results for in this thesis. This thesis also discusses how we gathered and constructed training and testing data, along with the setup and storage techniques we used.
Applied sciences; Computational linguistics; Feature selection; Text categorization; Text processing (Computer science)
Computer Sciences | Databases and Information Systems | Systems Architecture
Dave, Kandarp, "Study of feature selection algorithms for text-categorization" (2011). UNLV Theses, Dissertations, Professional Papers, and Capstones. 1380.