Award Date
2009
Degree Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science
Advisor 1
Kazem Taghva
First Committee Member
Ajoy K. Datta
Second Committee Member
Laxmi P. Gewali
Graduate Faculty Representative
Muthukumar Venkatesan
Number of Pages
68
Abstract
The result set produced by a search engine in response to the user query is very large. It is typically the responsibility of the user to browse the result set to identify relevant documents. Many tools have been developed to assist the user to identify the most relevant documents. One such a tool is clustering technique. In this method, the closely related documents are grouped based on their contents. Hence if a document turns out to be relevant, so are the rest of the documents in the cluster. So it would be easy for a user to sift through the result set and find the related documents, if all the closely related documents can be grouped together and displayed.
This thesis deals with the computational overhead involved when the sizes of document collections grow very large. We will provide a survey of some clustering methods that efficiently utilize memory and overcome the computational problems when large datasets are involved.
Keywords
Clustering; Datasets; DBSCAN; Large datasets; Memory availability; Tree-based data structures
Disciplines
Databases and Information Systems
File Format
Degree Grantor
University of Nevada, Las Vegas
Language
English
Repository Citation
Nemala, Vasanth, "Efficient clustering techniques for managing large datasets" (2009). UNLV Theses, Dissertations, Professional Papers, and Capstones. 72.
http://dx.doi.org/10.34870/1374219
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/