A Systematic Configuration Space Exploration of the Linux Kyber I/O Scheduler

HotCloudPerf'24, London, May 11, 2024

Download PDF Slides

Abstract

NVMe SSDs have become the de-facto storage choice for high-performance I/O-intensive workloads. Often, these workloads are run in a shared setting, such as in multi-tenant clouds where they share access to fast NVMe storage. In such a shared setting, ensuring quality of service among competing workloads can be challenging. To offer performance differentiation to I/O requests, various SSD-optimized I/O schedulers have been designed. However, many of them are either not publicly available or are yet to be proven in a production setting. Among the widely-tested I/O schedulers available in the Linux kernel, it has been shown that Kyber is one of the best-fit schedulers for SSDs due to its low CPU overheads and high scalability. However, Kyber has various configuration options, and there is limited knowledge on how to configure Kyber to improve applications’ performance. In this paper, we systematically characterize how Kyber’s configurations affect the performance of I/O workloads and how this effect differs with different file systems and storage devices. We report 11 observations and make 5 guidelines that indicate that (i) Kyber can deliver up to 26.3% lower read latency than the None scheduler with interfering write workloads; (ii) with a file system, Kyber can be configured to deliver up to 35.9% lower read latency at the cost of 34.5%–50.3% lower write throughput, allowing users to make a trade-off between read latency and write throughput; and (iii) Kyber leads to performance losses when Kyber is used with multiple throughput-bound workloads and the SSDs is not the bottleneck. Our benchmarking scripts and results are open-sourced and available at: https://github.com/stonet-research/hotcloudperf24-kyber-artifact-public.