Deep learning allows a computer to learn autonomously, unlike traditional computing where we need to explicitly tell a computer what to do. This approach has made deep learning very suitable for many applications across various fields of research, engineering, and daily life, resulting in a surge in popularity. A computer learns by training a neural network on large amounts of data, which allows the neural network to interpret the data as a human. To increase the performance of deep learning applications, larger and more expressive neural networks have been proposed which can more accurately model complex problems like face recognition and natural language processing. Training these very large models becomes increasingly difficult due to the high computational costs and large memory footprint, often exceeding the memory capacity of a single computing device. Therefore, several approaches for distributed training based on data parallelism and model/pipeline parallelism have emerged. In this work, we provide a theoretical comparison of these distribution models in terms of computation time and memory usage, and introduce DDLBench, a comprehensive open-source benchmark suite to quantify these differences in practice. DDLBench can evaluate the capability of a given system to perform distributed deep learning using a variety of neural networks, datasets and distribution models. By comparing our theoretical models with the benchmarking results, we show how the performance of real-life implementations diverges from these theoretical models, thus requiring benchmarking to capture the in-depth complexity of the frameworks themselves.