Invited Lecture, Intel Haifa, Israel
Over the past five years, Infrastructure-as-a-Service clouds have grown into the branch of ICT that offers services related to on-demand lease of storage, computation, and network. One of the major impediments in the selection and even use of (commercial) IaaS clouds is the lack of benchmarking results, that is, the lack of trustworthy quantitative information that allows (potential) cloud users to compare and reason about IaaS clouds.
In this talk we discuss empirical approaches to quantitative evaluation, which we find to be a necessary bumpy road toward cloud benchmarking. Both industry and academia have used empirical approaches for years, but the limited success achieved so far for IaaS clouds and similar systems (e.g., grids) is perhaps indicative of the complexity and the size of real-world challenges. We present the lessons we have learned in developing the SkyMark framework for cloud performance evaluation and the results of our SkyMark-based investigation of three research questions: What is the performance of production IaaS cloud services? How variable is the performance of widely used production cloud services? and What is the impact on performance of the user-level middleware, such as the provisioning and allocation policies that interact with IaaS clouds? We discuss the impact of our findings on large-scale, many-task, and many-user cloud applications (for which we also characterize and model the workload); notably, we discuss not only cloud performance, but also operation and behavior.
In contrast to previous attempts, our research combines empirical and other approaches, for example modeling and simulation, for deeper analysis; is based on a combination of short-term and multi-year measurements for better longevity of results; and uses large, comprehensive studies of several real clouds for an overall broader study. This presentation can also provide useful insights for fields related to benchmarking, for example experimental evaluation conducted in any large-scale distributed system.
Last but not least, we present a roadmap toward cloud benchmarking and the way we plan to progress on it with other members of the RG Cloud Group of the Standard Performance Evaluation Corporation (SPEC) [ http://research.spec.org/working-groups/rg-cloud-working-group.html ].
This work is based on recently published material [1-5], a digest of many publications from the past seven years, and several upcoming publications.
 Alexandru Iosup, Simon Ostermann, Nezih Yigitbasi, Radu Prodan, Thomas Fahringer, Dick H. J. Epema: Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing. IEEE Trans. Parallel Distrib. Syst. 22(6): 931-945 (2011)
 Alexandru Iosup, Dick H. J. Epema: Grid Computing Workloads. IEEE Internet Computing 15(2): 19-26 (2011)
 David Villegas, Athanasios Antoniou, Seyed Masoud Sadjadi, Alexandru Iosup: An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds. CCGRID 2012: 612-619
 Enno Folkerts, Alexander Alexandrov, Kai Sachs, Alexandru Iosup, Volker Markl, Cafer Tosun: Benchmarking in the Cloud: What It Should, Can, and Cannot Be. TPCTC 2012: 173-188
 Alexandru Iosup, Nezih Yigitbasi, Dick H. J. Epema: On the Performance Variability of Production Cloud Services. CCGRID 2011: 104-113