With the latest version of Tensor Flow now on windows, I am trying to get everything working as efficiently as possible. However, even when compiling from source, I still can't seem to figure out how to enable the SSE and AVX instructions. The default process: https://github.com/tensorflow/tensorflow/tree/r0.12/tensorflow/contrib/cmake has no mention of how to do this. The only reference I have found has been using Google's Bazel: How to compile Tensorflow with SSE4.2 and AVX instructions? Does anyone know of an easy way to turn on these advanced instructions using MSBuild? I hear they give at least a 3X speed up. To help those looking for a similar solution, this is the warning I am currently getting looks like this: https://github.com/tensorflow/tensorflow/tree/r0.12/tensorflow/contrib/cmake I am using Windows 10 Professional on a 64 bit platform, Visual Studio 2015 Community Edition, Anaconda Python 3.6 with cmake version 3.6.3 (later versions don't work for Tensor Flow)

Well, I tried to fix that, but I am not sure if it really worked. In <code>CMakeLists.txt</code> you will find the following statements: <pre class="prettyprint"><code>if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH) include(CheckCXXCompilerFlag) CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED) </code></pre> On MSVC platform, the test failes because MSVC doesn't support <code>-march=native</code> flag. I modified the statements like below: <pre class="prettyprint"><code>if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH) include(CheckCXXCompilerFlag) CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED) if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED) set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native") else() CHECK_CXX_COMPILER_FLAG("/arch:AVX" COMPILER_OPT_ARCH_AVX_SUPPORTED) if(COMPILER_OPT_ARCH_AVX_SUPPORTED) set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX") endif() endif() endif() </code></pre> By doing this, cmake would check if <code>/arch:AVX</code> is available and use it. Accordinf to MSDN and MSDN, SSE2 support is enabled by default for x86 compiling but not available for x64 compiling. For x64 compiling you can choose to use AVX or AVX2. I used AVX above because my CPU only supports AVX, youcan try AVX2 if you have a compatible CPU. By compiling use the above <code>CMakeLists.txt</code>, the compiling preocedure was much slower than official release, and warning about 'AVX/AVX2' disappeared, but warning about SSE/SSE2/3/4.1/4.2 still exists. I think these warnings can be ignored because there's no SSE support for x64 MSBuild. I am testing the new pip package now. It maybe faster than before, but I don't want to write a new benchmark ... Any one who is interested in this, please test if the new package is really faster. I did all these on the lasted git master branch, 2017-3-12. The pip package name shows that it was tensorflow 1.0.1

How to compile Tensor Flow with SSE and AVX instructions on Windows?

Tags:

c++

windows

tensorflow

msbuild

With the latest version of Tensor Flow now on windows, I am trying to get everything working as efficiently as possible. However, even when compiling from source, I still can't seem to figure out how to enable the SSE and AVX instructions.

The default process: https://github.com/tensorflow/tensorflow/tree/r0.12/tensorflow/contrib/cmake has no mention of how to do this.

The only reference I have found has been using Google's Bazel: How to compile Tensorflow with SSE4.2 and AVX instructions?

Does anyone know of an easy way to turn on these advanced instructions using MSBuild? I hear they give at least a 3X speed up.

To help those looking for a similar solution, this is the warning I am currently getting looks like this: https://github.com/tensorflow/tensorflow/tree/r0.12/tensorflow/contrib/cmake

I am using Windows 10 Professional on a 64 bit platform, Visual Studio 2015 Community Edition, Anaconda Python 3.6 with cmake version 3.6.3 (later versions don't work for Tensor Flow)

987

asked Mar 05 '17 01:03

Aerophilic

1 Answers

Well, I tried to fix that, but I am not sure if it really worked.

In CMakeLists.txt you will find the following statements:

if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
  include(CheckCXXCompilerFlag)
  CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)

On MSVC platform, the test failes because MSVC doesn't support -march=native flag. I modified the statements like below:

if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
  include(CheckCXXCompilerFlag)
  CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
  if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
  else()
    CHECK_CXX_COMPILER_FLAG("/arch:AVX" COMPILER_OPT_ARCH_AVX_SUPPORTED)
    if(COMPILER_OPT_ARCH_AVX_SUPPORTED)
      set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX")
    endif()
  endif()
endif()

By doing this, cmake would check if /arch:AVX is available and use it. Accordinf to MSDN and MSDN, SSE2 support is enabled by default for x86 compiling but not available for x64 compiling. For x64 compiling you can choose to use AVX or AVX2. I used AVX above because my CPU only supports AVX, youcan try AVX2 if you have a compatible CPU.

By compiling use the above CMakeLists.txt, the compiling preocedure was much slower than official release, and warning about 'AVX/AVX2' disappeared, but warning about SSE/SSE2/3/4.1/4.2 still exists. I think these warnings can be ignored because there's no SSE support for x64 MSBuild.

I am testing the new pip package now. It maybe faster than before, but I don't want to write a new benchmark ...

Any one who is interested in this, please test if the new package is really faster.

I did all these on the lasted git master branch, 2017-3-12. The pip package name shows that it was tensorflow 1.0.1

answered Oct 13 '22 01:10

Wesley Ranger

Related questions
                            
                                Creating a python object in C++ and calling its method
                            
                                Why does CRTP not cause infinite nesting?
                            
                                The lambda return type in C++
                            
                                c++ iterate through a vector of strings
                            
                                C++ struct and typdef
                            
                                Can sizeof be applied inside a lambda on a variable that is not captured or is this a compiler bug?
                            
                                Representation of Large Graph with 100 million nodes in C++
                            
                                undefined reference to `std::__cxx11::basic_string in Boost on Travis CI
                            
                                Prevent passing temporary for const ref parameter
                            
                                How to draw polygons in SDL? [closed]
                            
                                Why is it possible to use the dereferencing operator multiple times when assigning a function to a function pointer? [duplicate]
                            
                                Difference between inner and outer overloaded C++ operator
                            
                                Multithreaded Files Reading
                            
                                Is move semantics just a shallow copy and setting other's pointers to null?
                            
                                Why do lambda functions drop deduced return type reference by default?
                            
                                Why isn't enable_if working here?
                            
                                What does Q_REQUIRED_RESULT do?
                            
                                How to in-place initialize an array?
                            
                                How to link shared library *dll with CMake in Windows
                            
                                Handling pointers when wrapping C++ class with Cython

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With