Computer Science Faculty Research

Data Analysis Performance Comparison between Single-Mode and Multi-Mode

Sungchul Lee, University of Nevada, Las Vegas
Ju-Yeon Jo, University of Nevada, Las VegasFollow
Yoohwan Kim, University of Nevada, Las VegasFollow

Editors

S. Dascalu, F.C. Harris, Y. Shi (Eds.)

Document Type

Conference Proceeding

Publication Date

1-1-2016

Publication Title

25th International Conference on Software Engineering and Data Engineering, SEDE 2016

Publisher

The International Society for Computers and Their Applications (ISCA)

First page number:

Last page number:

Abstract

Nowadays a large volume of data is generated and stored at data centers, universities, and portals. Such data can be processed using single-mode-based tools such as R or multi-mode-based tools such as Hadoop. This research compares the performance of those tools with various types and sizes of data. For performance comparison, two algorithms have been used, Pearson correlation to analyze the relationship between text data, and Image Similarity MapReduce (ISMR) to analyze picture data. All data has been obtained from the Nevada Research Data Center (NRDC). We analyzed text data with R and Maria DB for single mode, and RHadoop and MapReduce for multimode. In our experiments, with 3 GB of text data, the single mode outperformed the multi-mode consistently by a factor of 4 or more. With image data, the single mode outperformed multi-mode up to about 2,000 images (∼8GB), then the multi-mode started outperforming the single mode. At 10,000 images (∼40GB), multi-mode outperformed single mode by a factor of 4. We learned that, while Hadoop is useful for processing large data, it is not efficient for handling small data. Single-mode tools such as R or MATLAB are more cost-effective to handle small data up to some point. Deciding the threshold size for choosing single-mode or multi-mode is rather subjective, however, and it needs be decided based on the types of data, the cost and performance of individual machines, and the cost of development and maintenance. Copyright ISCA, SEDE 2016.

Keywords

Big data; Hadoop; MapReduce; Maria DB; Pearson correlation; R; RHadoop

Language

English

Repository Citation

Lee, S., Jo, J., Kim, Y. (2016). Data Analysis Performance Comparison between Single-Mode and Multi-Mode. In S. Dascalu, F.C. Harris, Y. Shi (Eds.), 25th International Conference on Software Engineering and Data Engineering, SEDE 2016 47-52. The International Society for Computers and Their Applications (ISCA).

COinS

Digital Scholarship@UNLV

Computer Science Faculty Research

Data Analysis Performance Comparison between Single-Mode and Multi-Mode

Editors

Document Type

Publication Date

Publication Title

Publisher

First page number:

Last page number:

Abstract

Keywords

Language

Repository Citation

Browse

Links

Digital Scholarship@UNLV

Computer Science Faculty Research

Data Analysis Performance Comparison between Single-Mode and Multi-Mode

Authors

Editors

Document Type

Publication Date

Publication Title

Publisher

First page number:

Last page number:

Abstract

Keywords

Language

Repository Citation

Share

Browse

Links