Skip to content

Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI

Read our paper entitled “Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation” at


In this work together with the MVAPICH team from Ohio State University led by Prof. DK Panda performed an extensive benchmarking and testing suite of distributed TensorFlow methods. The goal of this work was to determine which method would perform the most optimal on various high performance computing infrastructures. The systems tested ranged from University clusters to the Piz Daint Supercomputer comprised of 5000 GPU powered compute nodes. The work showed that, depending on the workload, one has to be careful with which distribution method and libraries are used. The work furthermore identified a number of bottlenecks in MVAPICH for which improvements were proposed and that are now part of the stable MVAPICH distribution.


How can we help?

Reach out – we’d love to hear about how we can help.

We use cookies and similar technologies to enable services and functionality on our site and to understand your interaction with our service. By clicking on accept, you agree to our use of such technologies for analytics. See Privacy Policy

Leave this field blank