Feature recognition in Ocr text

Jeffrey Todd Gilbreth, University of Nevada, Las Vegas

Abstract

This thesis investigates the recognition and extraction of special word sequences, representing concepts, from OCR text. Unlike general index terms, concepts can consist of one or more terms that combined, have higher retrieval value than the terms alone (i.e. acronyms, proper nouns, phrases). An algorithm to recognize acronyms and their definitions will be presented. An evaluation of the algorithm will also be presented.