Award Date
1-1-2007
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
First Committee Member
Evangelos Yfantis
Number of Pages
72
Abstract
In a system where medical paper document images have been converted to a digital format by a scanning operation, understanding the document types that exists in this system could provide for vital data indexing and retrieval. In a system where millions of document images have been scanned, it is infeasible to expect a supervised based algorithm or a tedious (human based) effort to discover the document types. The most sensible and practical way to do that is an unsupervised algorithm. Many clustering techniques have been developed for unsupervised classification. Many rely on all data being presented at once, the number of clusters to be known, or both. Presented in this thesis is a clustering scheme that is a two-threshold based technique relying on a hierarchical decomposition of the features. On a subset of document images, it discovers document types at an acceptable level and confidently classifies unknown document images.
Keywords
Documents; Image; Learning; Types; Unsupervised
Controlled Subject
Computer science
File Format
File Size
2211.84 KB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Permissions
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.
Repository Citation
Curtis, Dean Patrick, "Unsupervised learning of document image types" (2007). UNLV Retrospective Theses & Dissertations. 2227.
http://dx.doi.org/10.25669/kwp5-6gew
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/
COinS