Feature Selection for Document Type Classification
Fifth International Conference on Information Technology: New Generations
Las Vegas, NV
In this paper, we report on the identification of document type using a k-dependence Bayesian categorization engine. In particular, we show that the use of font and capitalization as features improves precision and recall.
Bayes methods; Bayesian statistical decision theory; Capitalization; Character sets; Classification; Document classification; Document handling; Document type; Document type classification; Feature selection; Font; K-dependence Bayesian categorization engine; OCR; Text categorization
Computer Engineering | Databases and Information Systems | Electrical and Computer Engineering | Theory and Algorithms
Use Find in Your Library, contact the author, or interlibrary loan to garner a copy of the item. Publisher policy does not allow archiving the final published version. If a post-print (author's peer-reviewed manuscript) is allowed and available, or publisher policy changes, the item will be deposited.
Feature Selection for Document Type Classification.
Presentation at Fifth International Conference on Information Technology: New Generations,
Las Vegas, NV.
Available at: http://digitalscholarship.unlv.edu/ece_presentations/23