Master of Science (MS)
Number of Pages
Today's optical character recognition (OCR) devices ordinarily are not capable of delimiting or "marking up" specific structural information about the document such as the title, its authors, and titles of sections. Such information appears in the OCR device output, but would require a human to go through the output to locate the information. This type of information is highly useful for information retrieval (IR), allowing users much more flexibility in making queries of a retrieval system. This thesis will describe the design, implementation, and evaluation of a software system called Autotag. This system will automatically markup structural information in OCR-generated text. It will also establish a mapping between objects in page images and their corresponding ASCII representation. This mapping can then be used to design flexible image-based interfaces for information retrieval related applications.
Autotag; Collection; Creating; Document; Material; Printed; Structures; Tool
University of Nevada, Las Vegas
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to email@example.com and include clear identification of the work, preferably with URL.
Condit, Allen S, "Autotag: A tool for creating structured document collections from printed materials" (1994). UNLV Retrospective Theses & Dissertations. 437.
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/