Feature Selection for Document Type Classification
Meeting name
Fifth International Conference on Information Technology: New Generations
Document Type
Conference Proceeding
Meeting location
Las Vegas, NV
Publication Date
4-7-2008
Abstract
In this paper, we report on the identification of document type using a k-dependence Bayesian categorization engine. In particular, we show that the use of font and capitalization as features improves precision and recall.
Keywords
Bayes methods; Bayesian statistical decision theory; Capitalization; Character sets; Classification; Document classification; Document handling; Document type; Document type classification; Feature selection; Font; K-dependence Bayesian categorization engine; OCR; Text categorization
Disciplines
Computer Engineering | Databases and Information Systems | Electrical and Computer Engineering | Theory and Algorithms
Permissions
Use Find in Your Library, contact the author, or interlibrary loan to garner a copy of the item. Publisher policy does not allow archiving the final published version. If a post-print (author's peer-reviewed manuscript) is allowed and available, or publisher policy changes, the item will be deposited.
Repository Citation
Taghva, K.,
Vergara, J.
(2008, April).
Feature Selection for Document Type Classification.
Presentation at Fifth International Conference on Information Technology: New Generations,
Las Vegas, NV.
Available at: https://digitalscholarship.unlv.edu/ece_presentations/23