Award Date
12-2010
Degree Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science
First Committee Member
Kazem Taghva, Chair
Second Committee Member
Ajoy K. Datta
Third Committee Member
Laxmi P. Gewali
Graduate Faculty Representative
Muthukumar Venkatesan
Number of Pages
47
Abstract
The World Wide Web is a source of huge amount of unlabeled information spread across different sources in varied formats. This presents us with both opportunities and challenges in leveraging such large amount of unstructured data to build knowledge bases and to extract relevant information.
As part of this thesis, a semi-supervised logistic regression model called “Dual Iterative Pattern Relation Extraction” proposed by Sergey Brin is selected for further investigation. DIPRE presents a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample.
This project built in JAVA using "Google AJAX Search API" includes designing, implementing and testing DIPRE approach in extracting various relationships from the web.
Keywords
Dual Iterative Pattern Relation Extraction (DIPRE); Machine Learning; Pattern Extraction; Pattern recognition systems; Search engines — Programming
Disciplines
Computer Sciences
File Format
Degree Grantor
University of Nevada, Las Vegas
Language
English
Repository Citation
Mettu, Praveena, "Pattern extraction from the world wide web" (2010). UNLV Theses, Dissertations, Professional Papers, and Capstones. 741.
http://dx.doi.org/10.34917/2021406
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/