Computer Science Faculty Research

Hadoop Performance Analysis Model with Deep Data Locality

Sungchul Lee, University of Wisconsin-WhitewaterFollow
Ju-Yeon Jo, University of Nevada, Las VegasFollow
Yoohwan Kim, University of Nevada, Las VegasFollow

Document Type

Article

Publication Date

6-27-2019

Publication Title

Information

Publisher

MDPI

Volume

Issue

First page number:

Last page number:

Abstract

Background: Hadoop has become the base framework on the big data system via the simple concept that moving computation is cheaper than moving data. Hadoop increases a data locality in the Hadoop Distributed File System (HDFS) to improve the performance of the system. The network traffic among nodes in the big data system is reduced by increasing a data-local on the machine. Traditional research increased the data-local on one of the MapReduce stages to increase the Hadoop performance. However, there is currently no mathematical performance model for the data locality on the Hadoop. Methods: This study made the Hadoop performance analysis model with data locality for analyzing the entire process of MapReduce. In this paper, the data locality concept on the map stage and shuffle stage was explained. Also, this research showed how to apply the Hadoop performance analysis model to increase the performance of the Hadoop system by making the deep data locality. Results: This research proved the deep data locality for increasing performance of Hadoop via three tests, such as, a simulation base test, a cloud test and a physical test. According to the test, the authors improved the Hadoop system by over 34% by using the deep data locality. Conclusions: The deep data locality improved the Hadoop performance by reducing the data movement in HDFS.

Keywords

MapReduce; Hadoop; Data locality; HDFS; Deep data locality

Disciplines

Computer Sciences

File Format

pdf

File Size

5.061 KB

Language

English

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Repository Citation

Lee, S., Jo, J., Kim, Y. (2019). Hadoop Performance Analysis Model with Deep Data Locality. Information, 10(7), 1-17. MDPI.
http://dx.doi.org/10.3390/info10070222

Download

UNLV article access

Find in your library

Included in

Computer Sciences Commons

COinS

Digital Scholarship@UNLV

Computer Science Faculty Research

Hadoop Performance Analysis Model with Deep Data Locality

Document Type

Publication Date

Publication Title

Publisher

Volume

Issue

First page number:

Last page number:

Abstract

Keywords

Disciplines

File Format

File Size

Language

Creative Commons License

Repository Citation

Included in

Browse

Links

Digital Scholarship@UNLV

Computer Science Faculty Research

Hadoop Performance Analysis Model with Deep Data Locality

Authors

Document Type

Publication Date

Publication Title

Publisher

Volume

Issue

First page number:

Last page number:

Abstract

Keywords

Disciplines

File Format

File Size

Language

Creative Commons License

Repository Citation

Included in

Share

Browse

Links