Master of Science in Computer Science
First Committee Member
Kazem Taghva, Chair
Second Committee Member
Ajoy K. Datta
Third Committee Member
Laxmi P. Gewali
Graduate Faculty Representative
Number of Pages
The World Wide Web is a source of huge amount of unlabeled information spread across different sources in varied formats. This presents us with both opportunities and challenges in leveraging such large amount of unstructured data to build knowledge bases and to extract relevant information.
As part of this thesis, a semi-supervised logistic regression model called “Dual Iterative Pattern Relation Extraction” proposed by Sergey Brin is selected for further investigation. DIPRE presents a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample.
This project built in JAVA using "Google AJAX Search API" includes designing, implementing and testing DIPRE approach in extracting various relationships from the web.
Dual Iterative Pattern Relation Extraction (DIPRE); Machine Learning; Pattern Extraction; Pattern recognition systems; Search engines — Programming
Mettu, Praveena, "Pattern extraction from the world wide web" (2010). UNLV Theses, Dissertations, Professional Papers, and Capstones. 741.