Award Date

2009

Degree Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science

Advisor 1

Kazem Taghva

First Committee Member

Ajoy K. Datta

Second Committee Member

Laxmi P. Gewali

Graduate Faculty Representative

Muthukumar Venkatesan

Number of Pages

68

Abstract

The result set produced by a search engine in response to the user query is very large. It is typically the responsibility of the user to browse the result set to identify relevant documents. Many tools have been developed to assist the user to identify the most relevant documents. One such a tool is clustering technique. In this method, the closely related documents are grouped based on their contents. Hence if a document turns out to be relevant, so are the rest of the documents in the cluster. So it would be easy for a user to sift through the result set and find the related documents, if all the closely related documents can be grouped together and displayed.

This thesis deals with the computational overhead involved when the sizes of document collections grow very large. We will provide a survey of some clustering methods that efficiently utilize memory and overcome the computational problems when large datasets are involved.

Keywords

Clustering; Datasets; DBSCAN; Large datasets; Memory availability; Tree-based data structures

Disciplines

Databases and Information Systems

Language

English


Share

COinS