Press "Enter" to skip to content

Best Deep Learning Performance By An NVIDIA GPU Card: The Winner Is…

Chloe Shi 0

There’s been much industry debate over which NVIDIA GPU card is best-suited for deep learning and machine learning applications. At GTC 2015, NVIDIA CEO and co-founder Jen-Hsun Huang announced the release of the GeForce Titan X, touting it as “the most powerful processor ever built for training deep neural networks.” Within months, NVIDIA proclaimed the Tesla K80 is the ideal choice for enterprise-level deep learning applications due to enterprise-grade reliability through ECC protection and GPU Direct for clustering, better than Titan X which is technically a consumer-grade card. Then in November of 2015, NVIDIA released the Tesla M40. At 5x the price point of the Titan X, the Tesla M40 was marketed as “The World’s Fastest Deep Learning Training Accelerator.”

With this many “world’s fastest’s” and “most powerful’s” in such a short period of time, people were understandably confused. Therefore, as a leader in high-performance computing technologies and Deep Learning Solutions, AMAX’s engineering team endeavored to benchmark the various deep learning cards to determine which NVIDIA card performed the best for deep learning.

In a whitepaper titled Basic Performance Analysis of NVIDIA GPU Accelerator Cards for Deep Learning Applications AMAX’s team analyzed NVIDIA K40, K80, and M40 Enterprise GPU cards along with GeForce GTX Titan X and GTX 980 Ti (water-cooled) consumer grade cards running 256×256 pixel image recognition training using Caffe software. Systems used in the benchmark tests were AMAX’s DL-E400 (4xGPU workstation), DL-E380 (3U 8xGPU server) and the DL-E800 (4U 8xGPU server).

The study included:

  • Card specific performance analysis
  • Performance scaling from single GPU system to up to 8x GPU nodes
  • Performance impact of the CPU
  • Single and dual CPU solutions
  • Platform-specific performance differences

The Results

The study found that increasing the number of cards scaled the performance linearly, and cards based on the Maxwell architecture (Titan X, 980 Ti, M40) outperformed the Kepler cards (K40 and K80).

Most interesting was how poorly the K80 performed despite having the highest single-precision TFLOP performance spec.

So which card performed the best in our deep learning benchmark testing? Surprisingly, the water-cooled GTX 980 Ti. The Titan X and M40 came in second, displaying near neck-to-neck performance. Since the GTX 980 Ti may not be suitable for server integration, our recommendation would be the Titan X card and M40s for deep learning applications, with Titan X providing the best performance to cost ratio.

It remains to be seen how the Pascal-based GTX 1080 (replacement for GTX 980, to be released on May 27th, 2016) will perform in comparison, but early feedback is that the “2x better performance than Titan X” statistic relates to VR applications, not deep learning applications.