Award Date

12-2011

Degree Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science

First Committee Member

Kazem Taghva, Chair

Second Committee Member

Laxmi Gewali

Third Committee Member

Ajoy Datta

Graduate Faculty Representative

Venki Mukhukumar

Number of Pages

Abstract

This thesis will discuss feature selection algorithms for text-categorization. Feature selection algorithms are very important, as they can make-or-break a categorization engine. The feature selection algorithms that will be discussed in this thesis are Document Frequency, Information Gain, Chi Squared, Mutual Information, NGL (Ng-Goh-Low) coefficient, and GSS (Galavotti-Sebastiani-Simi) coefficient . The general idea of any feature selection algorithm is to determine importance of words using some measure that can keep informative words, and remove non-informative words, which can then help the text-categorization engine categorize a document, D , into some category, C . These feature selection methods are explained, implemented, and are provided results for in this thesis. This thesis also discusses how we gathered and constructed training and testing data, along with the setup and storage techniques we used.

Keywords

Applied sciences; Computational linguistics; Feature selection; Text categorization; Text processing (Computer science)

Disciplines

Computer Sciences | Databases and Information Systems | Systems Architecture

File Format

pdf

Degree Grantor

University of Nevada, Las Vegas

Language

English

Repository Citation

Dave, Kandarp, "Study of feature selection algorithms for text-categorization" (2011). UNLV Theses, Dissertations, Professional Papers, and Capstones. 1380.
http://dx.doi.org/10.34917/3274698

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/

Download

Included in

Databases and Information Systems Commons, Systems Architecture Commons

COinS

Digital Scholarship@UNLV

UNLV Theses, Dissertations, Professional Papers, and Capstones

Study of feature selection algorithms for text-categorization

Award Date

Degree Type

Degree Name

Department

First Committee Member

Second Committee Member

Third Committee Member

Graduate Faculty Representative

Number of Pages

Abstract

Keywords

Disciplines

File Format

Degree Grantor

Language

Repository Citation

Rights

Included in

Browse

Digital Scholarship@UNLV

UNLV Theses, Dissertations, Professional Papers, and Capstones

Study of feature selection algorithms for text-categorization

Author

Award Date

Degree Type

Degree Name

Department

First Committee Member

Second Committee Member

Third Committee Member

Graduate Faculty Representative

Number of Pages

Abstract

Keywords

Disciplines

File Format

Degree Grantor

Language

Repository Citation

Rights

Included in

Share

Browse