Feature Selection for Document Type Classification

Meeting name

Fifth International Conference on Information Technology: New Generations

Document Type

Conference Proceeding

Meeting location

Las Vegas, NV

Publication Date

4-7-2008

Abstract

In this paper, we report on the identification of document type using a k-dependence Bayesian categorization engine. In particular, we show that the use of font and capitalization as features improves precision and recall.

Keywords

Bayes methods; Bayesian statistical decision theory; Capitalization; Character sets; Classification; Document classification; Document handling; Document type; Document type classification; Feature selection; Font; K-dependence Bayesian categorization engine; OCR; Text categorization

Disciplines

Computer Engineering | Databases and Information Systems | Electrical and Computer Engineering | Theory and Algorithms

Permissions

Use Find in Your Library, contact the author, or interlibrary loan to garner a copy of the item. Publisher policy does not allow archiving the final published version. If a post-print (author's peer-reviewed manuscript) is allowed and available, or publisher policy changes, the item will be deposited.

UNLV article access

Share

COinS