Award Date
1-1-1994
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
Number of Pages
44
Abstract
Today's optical character recognition (OCR) devices ordinarily are not capable of delimiting or "marking up" specific structural information about the document such as the title, its authors, and titles of sections. Such information appears in the OCR device output, but would require a human to go through the output to locate the information. This type of information is highly useful for information retrieval (IR), allowing users much more flexibility in making queries of a retrieval system. This thesis will describe the design, implementation, and evaluation of a software system called Autotag. This system will automatically markup structural information in OCR-generated text. It will also establish a mapping between objects in page images and their corresponding ASCII representation. This mapping can then be used to design flexible image-based interfaces for information retrieval related applications.
Keywords
Autotag; Collection; Creating; Document; Material; Printed; Structures; Tool
Controlled Subject
Computer science
File Format
File Size
2385.92 KB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Permissions
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.
Repository Citation
Condit, Allen S, "Autotag: A tool for creating structured document collections from printed materials" (1994). UNLV Retrospective Theses & Dissertations. 437.
http://dx.doi.org/10.25669/hhva-1mg0
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/
COinS