Tutorial at ICPE
Speakers: Alexandru Iosup, Ana Lucia Varbanescu, Mihai Capota
Processing graphs, especially at large scale, is an increasingly useful activity in a variety of business, engineering, and scientific domains. The knowledge economy is based on data, of which graphs represent an increasing part, in advanced marketing, in social networking, in hiring of professionals, etc. Science is also increasingly dependent on linked data, in life sciences, in health and bioinformatics services, in academic networks. As a consequence, graph analytics is fast becoming a significant consumer of computing resources, due to the ever larger graphs (up to hundreds of billions of edges) and to the increased complexity of analysis tasks. Enabling existing algorithms to fit modern architectures and scale with these new requirements is very difficult. Already, there are tens of graph-processing platforms, such as the distributed Giraph and GraphLab, the GPU-enabled Totem and Medusa, etc., each with a different design and functionality. Hadoop and other generic big data platforms are also important contenders in this field. For graph-processing to continue to evolve, users have to find it easy to select a graph-processing platform, and developers and system integrators have to find it easy to quantify the performance and other non-functional aspects of interest.
In this tutorial, we will show how to evaluate and compare graph processing platforms using the GRAPHALYTICS benchmarking tools. GRAPHALYTICS can be used as a tool to enable graph analytics users to select a platform to work with (by comparing many different platforms across many classes of graph-processing algorithms and datasets). GRAPHALYTICS can also help application developers to gain understanding about the performance of their applications on a particular platform and to get clues about system bottlenecks (by showing the operation over time of the platforms). Among the metrics targeted by GRAPHALYTICS, we include Vertices and Edges Processed Per Second (V/EPPS), various scalability and traditional performance metrics, and an estimate of cost normalized by performance.