Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gcc optimisation effect on loops with apparently constant variable

I was optimizing a c++ code in which I encountered a situation that can be simplified as follows.

Consider this code:

#include <iostream>
#include <thread>

using namespace std;

bool hit = false;

void F()
{
   this_thread::sleep_for(chrono::seconds(1));
   hit = true;
}

int main()
{
   thread t(F);

   while (!hit)
      ;

   cout << "finished" << endl;
   t.join();
   return 0;
}

This basically starts a thread which after a second will change the value of hit to true. At the same time the code enters an empty loop which will continue until the hit's value will become true. I compiled this with gcc-5.4 using -g flag and everything was fine. The code will output finished and ends. But then I compiled it with -O2 flag and this time the code got stuck in the loop infinitely.

Looking at the disassembly, the compiler had generated the following, which is the root cause of the infinite loop:

jmp 0x6ba6f3 ! 0x00000000006ba6f3

OK, so clearly, the compiler has deduced that hit's value is false and it will not change in the loop so why not assume that it is an infinite loop without considering that another thread may change its value! And this optimization mode is added in the higher level (-O2). Since I'm not exactly an optimization flag expert, can anyone tell me which of them is responsible for this result so I can turn it off? And would turning it off have any major performance cost for other pieces of code? I mean, how much this pattern of code is rare?

like image 722
Sinapse Avatar asked Oct 27 '17 17:10

Sinapse


People also ask

What is O3 optimization?

Optimization level -O3 -O3 instructs the compiler to optimize for the performance of generated code and disregard the size of the generated code, which might result in an increased code size. It also degrades the debug experience compared to -O2 .

Does GCC optimize by default?

GCC has a range of optimization levels, plus individual options to enable or disable particular optimizations. The overall compiler optimization level is controlled by the command line option -On, where n is the required optimization level, as follows: -O0 . (default).

What is O3 in C?

-O3 : the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage. Compiling with -O3 is not a guaranteed way to improve performance, and in fact, in many cases, can slow down a system due to larger binaries and increased memory usage.

What optimization does GCC do?

The compiler optimizes to reduce the size of the binary instead of execution speed. If you do not specify an optimization option, gcc attempts to reduce the compilation time and to make debugging always yield the result expected from reading the source code.


3 Answers

This code has Undefined Behavior. You're modifying hit from one thread and reading it form another, without synchronization.

Optimizing hit to false is a valid outcome of Undefined Behavior. You can solve this by making hit a std::atomic<bool>. This makes if well-defined, and blocks the optimization.

like image 148
MSalters Avatar answered Oct 20 '22 00:10

MSalters


If you want to read/write hit from several threads at the same time then you need some kind of synchronization otherwise you'll introduce a race condition. You can either make hit an std::atomic<bool> or add a mutex that needs to be locked when accessing hit value. If you just want to wait for thread to finish its job than you can leave just thread.join() (and print "finished" after it) without introducing any additional flags.

like image 2
user7860670 Avatar answered Oct 20 '22 02:10

user7860670


By declaring hit as volatile, you're telling the compiler that this variable can be modified by external factors at any time, so the compiler won't assume that its value won't change along your main function.

As long as there is only one thread writing to the hit variable, your code should work properly, with no race conditions involved. However, when you're dealing with multiple threads, it's always safer to use synchronization tools, like atomic objects, mutexes and semaphores, as already mentioned in the other answers here.

like image 1
Milack27 Avatar answered Oct 20 '22 00:10

Milack27