Award Date

1-1-1994

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

Number of Pages

Abstract

Today's optical character recognition (OCR) devices ordinarily are not capable of delimiting or "marking up" specific structural information about the document such as the title, its authors, and titles of sections. Such information appears in the OCR device output, but would require a human to go through the output to locate the information. This type of information is highly useful for information retrieval (IR), allowing users much more flexibility in making queries of a retrieval system. This thesis will describe the design, implementation, and evaluation of a software system called Autotag. This system will automatically markup structural information in OCR-generated text. It will also establish a mapping between objects in page images and their corresponding ASCII representation. This mapping can then be used to design flexible image-based interfaces for information retrieval related applications.

Keywords

Autotag; Collection; Creating; Document; Material; Printed; Structures; Tool

Controlled Subject

Computer science

File Format

pdf

File Size

2385.92 KB

Degree Grantor

University of Nevada, Las Vegas

Language

English

Permissions

If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.

Repository Citation

Condit, Allen S, "Autotag: A tool for creating structured document collections from printed materials" (1994). UNLV Retrospective Theses & Dissertations. 437.
http://dx.doi.org/10.25669/hhva-1mg0

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/

Download

COinS

Digital Scholarship@UNLV

UNLV Retrospective Theses & Dissertations

Autotag: A tool for creating structured document collections from printed materials

Author