Scheduling in IaaS Cloud Computing Environments: Anything New?

Invited Lecture, CS Dept., The Hebrew University in Jerusalem (HUJI), Israel

Download PDF Slides

Abstract

The popularity of IaaS cloud computing environments, based on data center infrastructure and allowing the user to select when, which, and for how long to lease (virtualized) resources, has led to new computing architectures, new workload structures, and growing customer-bases. As a consequence, new scheduling approaches may be needed to manage resources efficiently, to quickly and correctly recommend a scheduling configuration to the user, and even to automatically an appropriate scheduling strategy on behalf of the user.

In this talk we present several recent approaches for scheduling in IaaS cloud computing environments.

We present a comprehensive and empirical performance-cost analysis of provisioning and allocation policies in IaaS clouds, through experimentation in three clouds, including Amazon EC2. We show that policies that dynamically provision and/or allocate resources can achieve better performance and cost than policies that do not, but that the lack of adaptation to the specific pricing model and technical capabilities of the used cloud can significantly reduce cost-effectiveness.

We present a comprehensive experimental investigation of the use of IaaS cloud infrastructure for scientific workloads. We perform an empirical evaluation of the performance of four commercial cloud computing services including Amazon EC2. We compare through trace-based simulation the performance characteristics and cost models of clouds and other scientific computing platforms when used for scientific computing workloads. Our results indicate that IaaS clouds may need an order of magnitude in performance improvement to be useful to the scientific community.

We discuss an elastic approach to manage MapReduce workloads across pre-provisioned infrastructure and a set of machines that dynamically changes (think malleable jobs in traditional parallel environments). We empirically evaluate for this approach the performance of three provisioning policies for dynamically resizing MapReduce clusters, and show evidence that elastic MapReduce can not only offer multiple types of isolation---with respect to performance, to data management, to fault tolerance, and to versioning---but also good performance.

We present the ExPERT scheduling framework for Bag-of-Tasks-based workloads: for a general, user-specified utility function that may consider both makespan and cost, and from a large search space it traverses, ExPERT systematically selects the Pareto-efficient scheduling strategies, that is, the strategies that deliver the best results.

We explore the concept of portfolio scheduling--in this context, the dynamic selection and use of a scheduling policy, depending on the current system and workload conditions, from a portfolio of multiple policies--and show evidence that it can be used to efficiently schedule scientific workloads for the entire data center.

This work is based on recently published material [1-5], a digest of many publications from the past seven years, and several upcoming publications.

References:
[1] Ruben Verboon, Kefeng Deng, Alexandru Iosup: A Periodic Portfolio Scheduler for Scientific Computing in the Data Center. JSSPP 2013.
[2] Bogdan Ghit, Nezih Yigitbasi, Dick Epema: Resource Management for Dynamic MapReduce Clusters in Multicluster Systems. MTAGS 2012. Best Paper Award.
[3] David Villegas, Athanasios Antoniou, Seyed Masoud Sadjadi, Alexandru Iosup: An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds. CCGRID 2012: 612-619
[4] Orna Agmon Ben-Yehuda, Assaf Schuster, Artyom Sharov, Mark Silberstein, Alexandru Iosup: ExPERT: Pareto-Efficient Task Replication on Grids and a Cloud. IPDPS 2012: 167-178
[5] Alexandru Iosup, Simon Ostermann, Nezih Yigitbasi, Radu Prodan, Thomas Fahringer, Dick H. J. Epema: Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing. IEEE Trans. Parallel Distrib. Syst. 22(6): 931-945 (2011)