Recently, I upgraded my project from gcc 4.3 to gcc 5.5. After that, I see a change of behaviour in the post-increment operator which is causing issues in my project. I am using a global variable as a control variable. For example, consider this sample program:
int i = 0;
int main()
{
int x[10];
x[i++] = 5; ===> culprit
return 0;
}
In the above snippet, the value of i
should be incremented only after 5
has been assigned to x[0]
which would guarantee that x[0]
has a proper valid value assigned before i
is incremented.
Now here comes the problem, I see that after moving to gcc 5.5, the assembly instructions have changed and the value of i gets incremented even before the assignment has happened. Assembly instruction of the above snippet:
Dump of assembler code for function main():
6 {
0x0000000000400636 <+0>: push %rbp
0x0000000000400637 <+1>: mov %rsp,%rbp
7 int x[10];
8
9 x[i++] = 1;
0x000000000040063a <+4>: mov 0x200a00(%rip),%eax # 0x601040 <i>
0x0000000000400640 <+10>: lea 0x1(%rax),%edx
0x0000000000400643 <+13>: mov %edx,0x2009f7(%rip) # 0x601040 <i> ====> i gets incremented here
0x0000000000400649 <+19>: cltq
0x000000000040064b <+21>: movl $0x5,-0x30(%rbp,%rax,4) =====> x[0] is assigned value here
10
11 return 0;
0x0000000000400653 <+29>: mov $0x0,%eax
12
13 }
0x0000000000400658 <+34>: pop %rbp
0x0000000000400659 <+35>: retq
Because of the above assembly another thread within the process using variable i
starts reading incorrect values from the global array.
Now the same code when compiled using gcc 4.3, adheres to the behaviour which I understand, i.e., the value is first assigned and then i
is incremented. Assembly instruction using gcc 4.3 for the same snippet:
Dump of assembler code for function main():
5 int main()
0x00000000004005da <+0>: push %rbp
0x00000000004005db <+1>: mov %rsp,%rbp
6 {
7 int x[10];
8
9 x[i++] = 1;
0x00000000004005de <+4>: mov 0x200a64(%rip),%edx # 0x601048 <i>
0x00000000004005e4 <+10>: movslq %edx,%rax
0x00000000004005e7 <+13>: movl $0x5,-0x30(%rbp,%rax,4) ======> x[0] gets assigned here
0x00000000004005ef <+21>: lea 0x1(%rdx),%eax
0x00000000004005f2 <+24>: mov %eax,0x200a50(%rip) # 0x601048 <i> ======> i gets incremented here
10
11 return 0;
0x00000000004005f8 <+30>: mov $0x0,%eax
12
13 }
0x00000000004005fd <+35>: leaveq
0x00000000004005fe <+36>: retq
I want to know if this is what is the expected behaviour with the new compilers? Is there any switch using which I can switch back to the old behaviour? Or is this a bug present in the new compiler?
Any help or leads will be appreciated.
Note: I want to avoid locks during the reading of i
because of performance issues. The culprit
line in the above code gets executed within a lock. So, only one thread can update i
at any point but because of the change in assembly instructions within the compilers, a race condition is introduced without any code change.
Edit 1: I know there are locking issues and I am also keeping that as an option but what I actually want to know is if there is any switch or flag using which I can get back to the older behaviour. The code base is very very huge and I would have to go through the entire code base to check if there exist similar problems at other locations in the code. So, getting the old behaviour back would be the life-saver.
You misunderstood the guarantees that post-increment offers. It guarantees that the location in which x
is stored will be calculated using the old value of i
. It absolutely does not guarantee that x
will be stored before the updated value is stored in i
.
The compiler is perfectly at liberty to convert the code to:
int temp = i;
i = temp+1;
x[temp] = 5;
Furthermore, if you have one thread modifying i
and another thread reading i
, you have no guarantees about which value the other thread will see. You don't even get a guarantee that it will see either the new value or the old value unless i
is std::atomic
.
Given you are trying to update both i
and x
in a coordinated way, you will have to have a lock.
The compiler could even convert your code to:
i = i + 1;
// <<<<
x[i-1] = 5;
Which is interesting if another thread jumps in and modifies i
at the point marked <<<<
.
When reading and writing the same variable from different threads, you must use some synchronization mechanism, e.g. a mutex
or an atomic
variable (if you want to synchronize just a single variable). Platforms other than x86 are much less forgiving in this sense.
In addition, when more than one variable is being modified, you need to ensure happens-before memory ordering semantics, in order for the other thread to "see" the new values in the right order in time. Using a mutex
automatically ensures acquire-release semantics (i.e. anything happened in one thread before releasing the mutex will be visible to the thread that locks it again). Without memory ordering you have no guarantee when the threads will see each other's changes in time (or at all).
An atomic
also comes with memory ordering, but only for the variable itself.
Without any memory synchronization the compiler assumes there are no other threads running, and is free to order memory accesses any way it likes. It may move the increment part before, after, or really anywhere as long as there is no observable effect in the current thread.
If you want to know more I recommend watching Herb's excellent talk on memory ordering.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With