Award Date

8-1-2014

Degree Type

Thesis

Degree Name

Master of Science in Electrical Engineering (MSEE)

Department

Electrical Engineering

First Committee Member

Pushkin Kachroo

Second Committee Member

Emma Regentova

Third Committee Member

Mei Yang

Fourth Committee Member

Haroon Stephen

Number of Pages

79

Abstract

With enormous amount of linguistic data present on web, text analysis has become one of the major fields of interest today. This field includes sentiment analysis, information retrieval, text document classification, knowledge based modeling, content similarity measure, data clustering, words prediction/correction, decision making etc. Managing and processing such data has vital importance. The field being quite broad, our focus is mainly on transportation related social media(Twitter) data extraction, text categorization/classification which can be further sub-divided into concept discovery, word sense disambiguation and sentiment analysis to analyze performance of existing transportation system worldwide. Concept discovery is the method of extracting the actual concept/context in which the text is about. This also allows us to filter irrelevant data. Word sense disambiguation is to find the correct sense in which a word is being used in a sentence. It is the basic necessity for concept discovery. A lot of research has been done in this field with major improvements. However, when it comes to short texts, the field still seems in nascent stage. Moreover, most of the methods today require huge amount training corpus(database). Arranging such corpus is a cumbersome task and requires a lot of human effort. The other problem with the existing methods are that they require a set of defined concepts from which a concept is chosen and labeled to text. We will consider the case of finding a general context. In this work a novel approach has been proposed for word sense disambiguation which in turn allows us to find general context. For this purpose, I have used the existing knowledge based semantic dictionary called WordNet. This methodology helps in avoiding the use of huge corpus and works for general context recognition. Our focus is on short-texts(Tweets) but the concept is easily applicable to text documents as well. Sentiment measuring technique was applied on extracted data and the scores were mapped to Google maps based on the location information present in the tweets. This clearly points out the locations where people are more frustrated with the existing transportation system and need immediate attention for improvements.

Keywords

Computational linguistics; Context (Linguistics); Semantics – Data processing; Semantics – Network analysis; Social media; Twitter

Disciplines

Computer Engineering | Computer Sciences | Electrical and Computer Engineering

Language

English


Share

COinS