Award Date


Degree Type


Degree Name

Master of Science (MS)


Computer Science

First Committee Member

Kazem Taghva

Number of Pages



A method of sequentially presented document determination using parallel analyses from various facets of structural document understanding and information retrieval is proposed in this thesis. Specifically, the method presented here intends to serve as a trainable system when determining where one document ends and another begins. Content analysis methods include use of the Vector Space Model, as well as targeted analysis of content on the margins of document fragments. Structural analysis for this implementation has been limited to simple and ubiquitous entities, such as software-generated zones, simple format-specific lines, and the appearance of page numbers. Analysis focuses on change in similarity between comparisons, with the emphasis placed on the fact that the extremities of documents tend to contain significant structural and lexical changes that can be observed and quantified. We combine the various features using nonlinear approximation (neural network) and experimentally test the usefulness of the combinations.


Analysis; Boundary; Determination; Document; Lexical; Structural

Controlled Subject

Computer science

File Format


File Size

1413.12 KB

Degree Grantor

University of Nevada, Las Vegas




If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to and include clear identification of the work, preferably with URL.


IN COPYRIGHT. For more information about this rights statement, please visit