Feature Selection for Document Type Classification

Meeting name

Fifth International Conference on Information Technology: New Generations

Document Type

Conference Proceeding

Meeting location

Las Vegas, NV

Publication Date



In this paper, we report on the identification of document type using a k-dependence Bayesian categorization engine. In particular, we show that the use of font and capitalization as features improves precision and recall.


Bayes methods; Bayesian statistical decision theory; Capitalization; Character sets; Classification; Document classification; Document handling; Document type; Document type classification; Feature selection; Font; K-dependence Bayesian categorization engine; OCR; Text categorization


Computer Engineering | Databases and Information Systems | Electrical and Computer Engineering | Theory and Algorithms


Use Find in Your Library, contact the author, or interlibrary loan to garner a copy of the item. Publisher policy does not allow archiving the final published version. If a post-print (author's peer-reviewed manuscript) is allowed and available, or publisher policy changes, the item will be deposited.

UNLV article access