I noticed that clang and gcc optimize away the construction of or assignment to a volatile struct
declared on the stack, in some scenarios. For example, the following code:
struct nonvol2 {
uint32_t a, b;
};
void volatile_struct2()
{
volatile nonvol2 temp = {1, 2};
}
Compiles on clang to:
volatile_struct2(): # @volatile_struct2()
ret
On the other hand, gcc does not remove the stores, although it does optimize the two implied stores into a single one:
volatile_struct2():
movabs rax, 8589934593
mov QWORD PTR [rsp-8], rax
ret
Oddly, clang won't optimize away a volatile store to a single int
variable:
void volatile_int() {
volatile int x = 42;
}
Compiles to:
volatile_int(): # @volatile_int()
mov dword ptr [rsp - 4], 1
ret
Furthermore a struct with 1 member rather than 2 is not optimized away.
Although gcc doesn't remove the construction in this particular case, it does perhaps even more aggressive optimizations in the case that the struct
members themselves are declared volatile
, rather than the struct
itself at the point of construction:
typedef struct {
volatile uint32_t a, b;
} vol2;
void volatile_def2()
{
vol2 temp = {1, 2};
vol2 temp2 = {1, 2};
temp.a = temp2.a;
temp.a = temp2.a;
}
simply compiles down to a simple ret
.
While it seems entirely "reasonable" to remove these stores which are pretty much impossible to observe by any reasonable process, my impression was that in the standard volatile
loads and stores are assumed to be part of the observable behavior of the program (in addition to calls to IO functions), full stop. The implication being they are not subject to removal by "as if", since it would by definition change the observable behavior of the program.
Am I wrong about that, or is clang breaking the rules here? Perhaps construction is excluded from the cases where volatile
must be assumed to have side effects?
From the point of view of the Standard, there is no requirement that implementations document anything about how any objects are physically stored in memory. Even if an implementation documents the behavior of using pointers of type unsigned char*
to access objects of a certain type, an implementation would be allowed to physically store data some other way and then have the code for character-based reads and writes adjust behaviors suitably.
If an execution platform specifies a relationship between abstract-machine objects and storage seen by the CPU, and defines ways by which accesses to certain CPU addresses might trigger side effects the compiler doesn't know about, a quality compiler suitable for low-level programming on that platform should generate code where the behavior of volatile
-qualified objects is consistent with that specification. The Standard makes no attempt to mandate that all implementations be suitable for low-level programming (or any other particular purpose, for that matter).
If the address of an automatic variable is never exposed to outside code, a volatile
qualifier need only have only two effects:
If setjmp
is called within a function, a compiler must do whatever is necessary to ensure that longjmp
will not disrupt the values of any volatile
-qualified objects, even if they were written between the setjmp
and longjmp
. Absent the qualifier, the value of objects written between setjmp
and longjmp
would become indeterminate when a longjmp
is executed.
Rules which would allow a compiler to presume that any loops which don't have side effects will run to completion do not apply in cases where a volatile object is accessed within the loop, whether or not an implementation would define any means by which such access would be observable.
Except in those cases, the as-if rule would allow a compiler to implement the volatile
qualifier in the abstract machine in a way that has no relation to the physical machine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With