Master of Science in Computer Science
First Committee Member
Ajoy K. Datta
Second Committee Member
Laxmi P. Gewali
Graduate Faculty Representative
Number of Pages
The result set produced by a search engine in response to the user query is very large. It is typically the responsibility of the user to browse the result set to identify relevant documents. Many tools have been developed to assist the user to identify the most relevant documents. One such a tool is clustering technique. In this method, the closely related documents are grouped based on their contents. Hence if a document turns out to be relevant, so are the rest of the documents in the cluster. So it would be easy for a user to sift through the result set and find the related documents, if all the closely related documents can be grouped together and displayed.
This thesis deals with the computational overhead involved when the sizes of document collections grow very large. We will provide a survey of some clustering methods that efficiently utilize memory and overcome the computational problems when large datasets are involved.
Clustering; Datasets; DBSCAN; Large datasets; Memory availability; Tree-based data structures
Databases and Information Systems
Nemala, Vasanth, "Efficient clustering techniques for managing large datasets" (2009). UNLV Theses, Dissertations, Professional Papers, and Capstones. 72.