Is the compiler allowed to optimize this (according to the C++17 standard):
int fn() { volatile int x = 0; return x; }
to this?
int fn() { return 0; }
If yes, why? If not, why not?
Here's some thinking about this subject: current compilers compile fn()
as a local variable put on the stack, then return it. For example, on x86-64, gcc creates this:
mov DWORD PTR [rsp-0x4],0x0 // this is x mov eax,DWORD PTR [rsp-0x4] // eax is the return register ret
Now, as far as I know the standard doesn't say that a local volatile variable should be put on the stack. So, this version would be equally good:
mov edx,0x0 // this is x mov eax,edx // eax is the return ret
Here, edx
stores x
. But now, why stop here? As edx
and eax
are both zero, we could just say:
xor eax,eax // eax is the return, and x as well ret
And we transformed fn()
to the optimized version. Is this transformation valid? If not, which step is invalid?
No. Access to volatile
objects is considered observable behavior, exactly as I/O, with no particular distinction between locals and globals.
The least requirements on a conforming implementation are:
- Access to
volatile
objects are evaluated strictly according to the rules of the abstract machine.[...]
These collectively are referred to as the observable behavior of the program.
N3690, [intro.execution], ¶8
How exactly this is observable is outside the scope of the standard, and falls straightly into implementation-specific territory, exactly as I/O and access to global volatile
objects. volatile
means "you think you know everything going on here, but it's not like that; trust me and do this stuff without being too smart, because I'm in your program doing my secret stuff with your bytes". This is actually explained at [dcl.type.cv] ¶7:
[ Note:
volatile
is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation. Furthermore, for some implementations, volatile might indicate that special hardware instructions are required to access the object. See 1.9 for detailed semantics. In general, the semantics of volatile are intended to be the same in C++ as they are in C. — end note ]
This loop can be optimised away by the as-if rule because it has no observable behaviour:
for (unsigned i = 0; i < n; ++i) { bool looped = true; }
This one cannot:
for (unsigned i = 0; i < n; ++i) { volatile bool looped = true; }
The second loop does something on every iteration, which means the loop takes O(n) time. I have no idea what the constant is, but I can measure it and then I have a way of busy looping for a (more or less) known amount of time.
I can do that because the standard says that access to volatiles must happen, in order. If a compiler were to decide that in this case the standard didn't apply, I think I would have the right to file a bug report.
If the compiler chooses to put looped
into a register, I suppose I have no good argument against that. But it still must set the value of that register to 1 for every loop iteration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With