I was optimizing a c++ code in which I encountered a situation that can be simplified as follows. Consider this code: <pre class="prettyprint"><code>#include <iostream> #include <thread> using namespace std; bool hit = false; void F() { this_thread::sleep_for(chrono::seconds(1)); hit = true; } int main() { thread t(F); while (!hit) ; cout << "finished" << endl; t.join(); return 0; } </code></pre> This basically starts a thread which after a second will change the value of <code>hit</code> to <code>true</code>. At the same time the code enters an empty loop which will continue until the <code>hit</code>'s value will become <code>true</code>. I compiled this with <code>gcc-5.4</code> using <code>-g</code> flag and everything was fine. The code will output <code>finished</code> and ends. But then I compiled it with <code>-O2</code> flag and this time the code got stuck in the loop infinitely. Looking at the disassembly, the compiler had generated the following, which is the root cause of the infinite loop: <blockquote> jmp 0x6ba6f3 ! 0x00000000006ba6f3 </blockquote> OK, so clearly, the compiler has deduced that <code>hit</code>'s value is <code>false</code> and it will not change in the loop so why not assume that it is an infinite loop without considering that another thread may change its value! And this optimization mode is added in the higher level (<code>-O2</code>). Since I'm not exactly an optimization flag expert, can anyone tell me which of them is responsible for this result so I can turn it off? And would turning it off have any major performance cost for other pieces of code? I mean, how much this pattern of code is rare?

This code has Undefined Behavior. You're modifying <code>hit</code> from one thread and reading it form another, without synchronization. Optimizing <code>hit</code> to <code>false</code> is a valid outcome of Undefined Behavior. You can solve this by making <code>hit</code> a <code>std::atomic<bool></code>. This makes if well-defined, and blocks the optimization.

If you want to read/write <code>hit</code> from several threads at the same time then you need some kind of synchronization otherwise you'll introduce a race condition. You can either make <code>hit</code> an <code>std::atomic<bool></code> or add a <code>mutex</code> that needs to be locked when accessing <code>hit</code> value. If you just want to wait for thread to finish its job than you can leave just <code>thread.join()</code> (and print "finished" after it) without introducing any additional flags.

By declaring <code>hit</code> as volatile, you're telling the compiler that this variable can be modified by external factors at any time, so the compiler won't assume that its value won't change along your <code>main</code> function. As long as there is only one thread writing to the <code>hit</code> variable, your code should work properly, with no race conditions involved. However, when you're dealing with multiple threads, it's always safer to use synchronization tools, like atomic objects, mutexes and semaphores, as already mentioned in the other answers here.

gcc optimisation effect on loops with apparently constant variable

Tags:

c++

optimization

multithreading

gcc

I was optimizing a c++ code in which I encountered a situation that can be simplified as follows.

Consider this code:

#include <iostream>
#include <thread>

using namespace std;

bool hit = false;

void F()
{
   this_thread::sleep_for(chrono::seconds(1));
   hit = true;
}

int main()
{
   thread t(F);

   while (!hit)
      ;

   cout << "finished" << endl;
   t.join();
   return 0;
}

This basically starts a thread which after a second will change the value of hit to true. At the same time the code enters an empty loop which will continue until the hit's value will become true. I compiled this with gcc-5.4 using -g flag and everything was fine. The code will output finished and ends. But then I compiled it with -O2 flag and this time the code got stuck in the loop infinitely.

Looking at the disassembly, the compiler had generated the following, which is the root cause of the infinite loop:

jmp 0x6ba6f3 ! 0x00000000006ba6f3

OK, so clearly, the compiler has deduced that hit's value is false and it will not change in the loop so why not assume that it is an infinite loop without considering that another thread may change its value! And this optimization mode is added in the higher level (-O2). Since I'm not exactly an optimization flag expert, can anyone tell me which of them is responsible for this result so I can turn it off? And would turning it off have any major performance cost for other pieces of code? I mean, how much this pattern of code is rare?

722

asked Oct 27 '17 17:10

Sinapse

3 Answers

This code has Undefined Behavior. You're modifying hit from one thread and reading it form another, without synchronization.

Optimizing hit to false is a valid outcome of Undefined Behavior. You can solve this by making hit a std::atomic<bool>. This makes if well-defined, and blocks the optimization.

148

answered Oct 20 '22 00:10

MSalters

If you want to read/write hit from several threads at the same time then you need some kind of synchronization otherwise you'll introduce a race condition. You can either make hit an std::atomic<bool> or add a mutex that needs to be locked when accessing hit value. If you just want to wait for thread to finish its job than you can leave just thread.join() (and print "finished" after it) without introducing any additional flags.

answered Oct 20 '22 02:10

user7860670

By declaring hit as volatile, you're telling the compiler that this variable can be modified by external factors at any time, so the compiler won't assume that its value won't change along your main function.

As long as there is only one thread writing to the hit variable, your code should work properly, with no race conditions involved. However, when you're dealing with multiple threads, it's always safer to use synchronization tools, like atomic objects, mutexes and semaphores, as already mentioned in the other answers here.

answered Oct 20 '22 00:10

Milack27

Related questions
                            
                                SDL2 & GDB: program received signal ?, unknown signal
                            
                                double colon after class name (declaration) - what does that mean?
                            
                                std::bitset hash function algorithm
                            
                                C++ Macro to create a string array
                            
                                can't include Python.h in visual studio
                            
                                Random numbers in C++11: is there a simple way to seed the generator in one place of the code, then use it in different functions?
                            
                                Make Qlabel clickable or double clickable in Qt
                            
                                Is the std::map default constructor explicit?
                            
                                Same key, multiple entries for std::unordered_map?
                            
                                How to get gtest TYPED_TEST parameter type
                            
                                QList<QList<QString>> passed into QML
                            
                                C++ how to manage iterators of contiguous dynamic arrays
                            
                                What is the reasoning behind OpenGL texture units as opposed to regular buffers and uniforms?
                            
                                What is struct NIL { typedef NIL Head; }?
                            
                                Is there a simple method to highlight the mask?
                            
                                macOS : Correct programmatic way to determine volume info
                            
                                Preventing upcasts for C style array pointers
                            
                                change cmake compiler from clang to g++ for mac
                            
                                Is it possible to avoid static_cast in initializer list?
                            
                                CMake error "Expected a command name"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With