Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How faster is tensorflow-gpu with AVX and AVX2 compared with it without AVX and AVX2?

How faster is tensorflow-gpu with AVX and AVX2 compared with it without AVX and AVX2?

I tried to find an answer using Google but with no success. It's hard to recompile tensorflow-gpu for Windows. So, I want to know if it worth it.

like image 485
Dmitry Avatar asked Sep 10 '17 03:09

Dmitry


People also ask

Does TensorFlow need AVX?

TensorFlow 1.15 requires AVX support, which is incompatible with M1 Macs · Issue #3566 · spinalcordtoolbox/spinalcordtoolbox · GitHub.

Is AVX2 the same as AVX?

The only difference between AVX and AVX2 for floating point code is availability of new FMA instruction – both AVX and AVX2 have 256-bit FP registers. The main advantage of new ISA of AVX2 is for integer code/data types – there you can expect up to 2x speedup, but 8% for FP code is good speedup of AVX2 over AVX.

How much faster is GPU than CPU TensorFlow?

GPU vs CPU Performance in Deep Learning Models Generally speaking, GPUs are 3X faster than CPUs.


1 Answers

If your computation is one giant matmul on CPU, you will get 3x speed-up on Xeon V3 (see benchmark here). But it's also possible to see no speed-up, presumably because there's not enough time spent in high arithmetic intensity ops executed on CPU.

Here's a table from "High Performance Models" guide for training of resnet50 on CPU with difference optimizations. It looks like you can get 2.5 speed-up with best settings

| Optimization | Data Format | Images/Sec   | Intra threads | Inter Threads |
:              :             : (step time)  :               :               :
| ------------ | ----------- | ------------ | ------------- | ------------- |
| AVX2         | NHWC        | 6.8 (147ms)  | 4             | 0             |
| MKL          | NCHW        | 6.6 (151ms)  | 4             | 1             |
| MKL          | NHWC        | 5.95 (168ms) | 4             | 1             |
| AVX          | NHWC        | 4.7 (211ms)  | 4             | 0             |
| SSE3         | NHWC        | 2.7 (370ms)  | 4             | 0             |

If you are able to compile an optimized version for Windows, it would help to mention it in this issue -- https://github.com/yaroslavvb/tensorflow-community-wheels/issues/13 , it seems there's some demand for such a build

like image 84
Yaroslav Bulatov Avatar answered Oct 22 '22 21:10

Yaroslav Bulatov