Award Date

5-2011

Degree Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science

First Committee Member

Kazem Taghva, Chair

Second Committee Member

Ajoy K. Datta

Third Committee Member

Laxmi P. Gewali

Graduate Faculty Representative

Venkatesan Muthukumar

Number of Pages

62

Abstract

There is a continuous progress in automatic recording of broadcast speech using speech recognition. With the increasing use of this technology, a new source of data is added to the pool of information available over web. This has increased the need to categorize the resulting text, based on their topic for the purpose of information retrieval.

In this thesis we present an approach to automatically assign a topic or track a change of topic in a stream of input data. Our approach is based on the use of Hidden Markov Models and language processing techniques. We consider input text as stream of words and use Hidden Markov Model to assign the most appropriate topic to the text. Then we process this output to identify the topic boundaries. The main focus of this thesis is to automatically assign a topic to specific story.

Keywords

Automatic indexing; Automatic speech recognition; Hidden Markov models; Information retrieval; Sound recordings

Disciplines

Computer Sciences | Theory and Algorithms

Language

English


Share

COinS