Will It Rain Today? Understanding the Weather of Computing Clouds, Before it Happens

EuroSys Shadow PC meeting

Download PDF Slides

Abstract

Cloud computing services play an important role in today's modern society. They enable daily operation and advances in key application domains, from banking to e-commerce, from science to gaming, from governance to education. Combining technology developed since the 1960s (e.g., modes of resource sharing) with new paradigms that could only have emerged in the 2010s (e.g., FaaS), they promise to enable unprecedented efficiency and seamless access to services for many. However successful, we cannot take the cloud for granted: its core does not yet rely on sound principles of science and design, its engineering is often based on hacking, and there have already been worrying signs of unstable operation. In this talk, we posit that we can address the current challenges by focusing on the relatively large complex of systems (that is, systems of systems or even ecosystems), and by increasing and focusing the effort put into performance experiments, load testing, and benchmarking. We contrast this to the current focus on single or relatively small systems, and on experimentation that is not always principled. We show examples of how our approach could work in practice, presenting (i) results related to performance variability, (ii) discovery methods that feed into the engineering of future load testing and benchmarking frameworks, and (iii) processes that could improve the reproducibility and credibility of experimental results in this field. This leads us to formulate the vision of a community-wide effort to create the Distributed Systems Memex, to share and preserve operational and especially performance traces collected from the distributed systems that currently underpin our society. Part of this work has been conducted in the international collaboration provided by the SPEC RG Cloud Group.