Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compile Tensorflow with SSE4.2 and AVX instructions?

This is the message received from running a script to check if Tensorflow is working:

I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

I noticed that it has mentioned SSE4.2 and AVX,

  1. What are SSE4.2 and AVX?
  2. How do these SSE4.2 and AVX improve CPU computations for Tensorflow tasks.
  3. How to make Tensorflow compile using the two libraries?
like image 548
GabrielChu Avatar asked Dec 22 '16 23:12

GabrielChu


2 Answers

I just ran into this same problem, it seems like Yaroslav Bulatov's suggestion doesn't cover SSE4.2 support, adding --copt=-msse4.2 would suffice. In the end, I successfully built with

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

without getting any warning or errors.

Probably the best choice for any system is:

bazel build -c opt --copt=-march=native --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

(Update: the build scripts may be eating -march=native, possibly because it contains an =.)

-mfpmath=both only works with gcc, not clang. -mfpmath=sse is probably just as good, if not better, and is the default for x86-64. 32-bit builds default to -mfpmath=387, so changing that will help for 32-bit. (But if you want high-performance for number crunching, you should build 64-bit binaries.)

I'm not sure what TensorFlow's default for -O2 or -O3 is. gcc -O3 enables full optimization including auto-vectorization, but that sometimes can make code slower.


What this does: --copt for bazel build passes an option directly to gcc for compiling C and C++ files (but not linking, so you need a different option for cross-file link-time-optimization)

x86-64 gcc defaults to using only SSE2 or older SIMD instructions, so you can run the binaries on any x86-64 system. (See https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html). That's not what you want. You want to make a binary that takes advantage of all the instructions your CPU can run, because you're only running this binary on the system where you built it.

-march=native enables all the options your CPU supports, so it makes -mavx512f -mavx2 -mavx -mfma -msse4.2 redundant. (Also, -mavx2 already enables -mavx and -msse4.2, so Yaroslav's command should have been fine). Also if you're using a CPU that doesn't support one of these options (like FMA), using -mfma would make a binary that faults with illegal instructions.

TensorFlow's ./configure defaults to enabling -march=native, so using that should avoid needing to specify compiler options manually.

-march=native enables -mtune=native, so it optimizes for your CPU for things like which sequence of AVX instructions is best for unaligned loads.

This all applies to gcc, clang, or ICC. (For ICC, you can use -xHOST instead of -march=native.)

like image 72
Mike Chiu Avatar answered Nov 10 '22 10:11

Mike Chiu


Let's start with the explanation of why do you see these warnings in the first place.


Most probably you have not installed TF from source and instead of it used something like pip install tensorflow. That means that you installed pre-built (by someone else) binaries which were not optimized for your architecture. And these warnings tell you exactly this: something is available on your architecture, but it will not be used because the binary was not compiled with it. Here is the part from documentation.

TensorFlow checks on startup whether it has been compiled with the optimizations available on the CPU. If the optimizations are not included, TensorFlow will emit warnings, e.g. AVX, AVX2, and FMA instructions not included.

Good thing is that most probably you just want to learn/experiment with TF so everything will work properly and you should not worry about it


What are SSE4.2 and AVX?

Wikipedia has a good explanation about SSE4.2 and AVX. This knowledge is not required to be good at machine-learning. You may think about them as a set of some additional instructions for a computer to use multiple data points against a single instruction to perform operations which may be naturally parallelized (for example adding two arrays).

Both SSE and AVX are implementation of an abstract idea of SIMD (Single instruction, multiple data), which is

a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Thus, such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel) computations, but only a single process (instruction) at a given moment

This is enough to answer your next question.


How do these SSE4.2 and AVX improve CPU computations for TF tasks

They allow a more efficient computation of various vector (matrix/tensor) operations. You can read more in these slides


How to make Tensorflow compile using the two libraries?

You need to have a binary which was compiled to take advantage of these instructions. The easiest way is to compile it yourself. As Mike and Yaroslav suggested, you can use the following bazel command

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

like image 140
Salvador Dali Avatar answered Nov 10 '22 10:11

Salvador Dali