Award Date
1-1-2003
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
First Committee Member
Kazem Tagva
Number of Pages
36
Abstract
Optical Recognition Technology is typically used to convert hard copy printed material into its electronic form. Many presentational artifacts such as end-of-line hyphenations, running headers and footers are literally converted. These artifacts can possibly hinder proximity and exact match searching; This thesis develops an algorithm to extract running headers and footers from electronic documents generated by OCR. This method associates each page of the document with its neighboring pages and detects the headers and footers by comparing the page with its neighboring pages. Experiments are also taken to test the effectiveness of these algorithms.
Keywords
Documents; Footers; Headers; Identification; Noisy
Controlled Subject
Computer science
File Format
File Size
1075.2 KB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Permissions
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.
Repository Citation
Liu, Qin, "Identification of headers and footers in noisy documents" (2003). UNLV Retrospective Theses & Dissertations. 1611.
http://dx.doi.org/10.25669/r00m-5nzp
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/
COinS