Award Date
1-1-2008
Degree Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science
First Committee Member
Kazem Taghva
Number of Pages
61
Abstract
In general, technical papers are augmented with a list of bibliographic citations to support the arguments and the merits of the approach presented. Each and every citation is made up of parts like author, journal, volume, book etc. Extracting the parts of the citation from a written document and properly separating into its parts is the problem that is being addressed in this thesis; We use an Information Extraction (IE) technique based on Hidden Markov Model (HMM) to solve this problem. This solution consists of the design of an HMM, the training of the HMM with tagged data, and an implementation of Forward Chaining algorithm for extraction of citation parts. Our test on a collection of 150 citations has recall and precision of 0.8 and 0.81 respectively.
Keywords
Hidden; Markov; Model; References; Standardization
Controlled Subject
Computer science
File Format
File Size
1843.2 KB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Permissions
If you are the rightful copyright holder of this dissertation or thesis and wish to have the full text removed from Digital Scholarship@UNLV, please submit a request to digitalscholarship@unlv.edu and include clear identification of the work, preferably with URL.
Repository Citation
Sambamurthy, Swamynathan, "Standardization of references using Hidden Markov Model" (2008). UNLV Retrospective Theses & Dissertations. 2384.
http://dx.doi.org/10.25669/fo5c-uxhi
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/
COinS