How faster is tensorflow-gpu
with AVX and AVX2 compared with it without AVX and AVX2?
I tried to find an answer using Google but with no success. It's hard to recompile tensorflow-gpu
for Windows. So, I want to know if it worth it.
TensorFlow 1.15 requires AVX support, which is incompatible with M1 Macs · Issue #3566 · spinalcordtoolbox/spinalcordtoolbox · GitHub.
The only difference between AVX and AVX2 for floating point code is availability of new FMA instruction – both AVX and AVX2 have 256-bit FP registers. The main advantage of new ISA of AVX2 is for integer code/data types – there you can expect up to 2x speedup, but 8% for FP code is good speedup of AVX2 over AVX.
GPU vs CPU Performance in Deep Learning Models Generally speaking, GPUs are 3X faster than CPUs.
If your computation is one giant matmul on CPU, you will get 3x speed-up on Xeon V3 (see benchmark here). But it's also possible to see no speed-up, presumably because there's not enough time spent in high arithmetic intensity ops executed on CPU.
Here's a table from "High Performance Models" guide for training of resnet50 on CPU with difference optimizations. It looks like you can get 2.5 speed-up with best settings
| Optimization | Data Format | Images/Sec | Intra threads | Inter Threads |
: : : (step time) : : :
| ------------ | ----------- | ------------ | ------------- | ------------- |
| AVX2 | NHWC | 6.8 (147ms) | 4 | 0 |
| MKL | NCHW | 6.6 (151ms) | 4 | 1 |
| MKL | NHWC | 5.95 (168ms) | 4 | 1 |
| AVX | NHWC | 4.7 (211ms) | 4 | 0 |
| SSE3 | NHWC | 2.7 (370ms) | 4 | 0 |
If you are able to compile an optimized version for Windows, it would help to mention it in this issue -- https://github.com/yaroslavvb/tensorflow-community-wheels/issues/13 , it seems there's some demand for such a build
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With