In this paper, we investigate real-world scenarios in which MapReduce programming model and specifically Hadoop framework could be used for processing large-scale, geographically scattered datasets. We propose an Adaptive Reduce Task Scheduling (ARTS) algorithm and evaluate it on a distributed Hadoop cluster involving multiple datacenters as well as the on a shared Hadoop cluster. The evaluation demonstrates that the ARTS algorithm outperforms the default Reduce phase scheduling algorithm in Hadoop framework.
All Science Journal Classification (ASJC) codes
- Hardware and Architecture
- Computer Networks and Communications
- Data center
- Data processing