Award Date

1-1-2008

Degree Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science

First Committee Member

Kazem Taghva

Number of Pages

61

Abstract

In general, technical papers are augmented with a list of bibliographic citations to support the arguments and the merits of the approach presented. Each and every citation is made up of parts like author, journal, volume, book etc. Extracting the parts of the citation from a written document and properly separating into its parts is the problem that is being addressed in this thesis; We use an Information Extraction (IE) technique based on Hidden Markov Model (HMM) to solve this problem. This solution consists of the design of an HMM, the training of the HMM with tagged data, and an implementation of Forward Chaining algorithm for extraction of citation parts. Our test on a collection of 150 citations has recall and precision of 0.8 and 0.81 respectively.

Keywords

Hidden; Markov; Model; References; Standardization

Controlled Subject

Computer science

File Format

pdf

File Size

1843.2 KB

Degree Grantor

University of Nevada, Las Vegas

Language

English

Permissions

If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/


COinS