Recently I tried to implement and deploy a deep learning solution (multilayered LSTM net with additional layers for static input) on big CPUs server. After many tries I achieved only 3x speed up comparing to performance on my personal computer. I've heard that GPUs might do a better job. Could you explain me what is the exact reason why GPUs are so much better than CPUs when it comes to deep neural network computations?
The GPUs' architecture is mainly focused on parallelism, while CPU's one isn't. That means that a GPU can do a lot of simple operations at the same time; for example, a GPU can process the color of each pixel of your screen (1920x1080 is almost 2 million pixels) 60 times per second. A general purpose CPU can have one ALU per core (physical or logical), so maybe your CPU has 8/16 ALUs. A GPU can have thousands of them.
Making the long story short: a CPU can execute a few complex operations very quick, while a GPU can execute thousands of very simple operations very quick. Also, as a GPU processes a lot of data at the same time, it usually comes with a very high-speed RAM to avoid bottlenecks.
Neural networks are basically a lot of small "computers" working in parallel, so the architecture of a GPU suits better for this task
The de-facto algorithm for training deep neural-nets is the Back-propagation algorithm. It involves computing Jacobian matrices at various level of the network and multiplication of those matrices. The matrix multiplication step is where GPUs outshine CPUs, since the operations involved are structured and don't need the complex machinery (like branch prediction, out-of-order scheduling) present in CPUs. As a side point, you could argue that CPUs have become much better at matrix multiplication by using techniques like cache blocking, prefetching and hand-coded assembly.
Besides training, the inference part for the neural-nets also takes advantage of efficient matrix multiplication. This is because the inputs for various layers and the weight (parameter) vectors are usually stored in tensor form.
Another benefit of using GPUs is the better memory bandwidth on offer. GDDR5x is getting close to 500 GB/s, compared to ~80-100 GB/s offered by state-of-the-art DDR4. So, you get a ~5x factor of bandwidth improvement, which the memory-intensive neural-nets computations can leverage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With