The function 'foo' in the snippet below will load only one of the struct members A or B; well at least that is the intention of the unoptimized code.
typedef struct { int A; int B; } Pair; int foo(const Pair *P, int c) { int x; if (c) x = P->A; else x = P->B; return c/102 + x; }
Here is what gcc -O3 gives:
mov eax, esi mov edx, -1600085855 test esi, esi mov ecx, DWORD PTR [rdi+4] <-- ***load P->B** cmovne ecx, DWORD PTR [rdi] <-- ***load P->A*** imul edx lea eax, [rdx+rsi] sar esi, 31 sar eax, 6 sub eax, esi add eax, ecx ret
So it appears that gcc is allowed to speculatively load both struct members in order to eliminate branching. But then, is the following code considered undefined behavior or is the gcc optimization above illegal?
#include <stdlib.h> int naughty_caller(int c) { Pair *P = (Pair*)malloc(sizeof(Pair)-1); // *** Allocation is enough for A but not for B *** if (!P) return -1; P->A = 0x42; // *** Initializing allocation only where it is guaranteed to be allocated *** int res = foo(P, 1); // *** Passing c=1 to foo should ensure only P->A is accessed? *** free(P); return res; }
If the load speculation will happen in the above scenario there is a chance that loading P->B will cause an exception because the last byte of P->B may lie in unallocated memory. This exception will not happen if the optimization is turned off.
Is the gcc optimization shown above of load speculation legal? Where does the spec say or imply that it's ok? If the optimization is legal, how is the code in 'naughtly_caller' turn out to be undefined behavior?
Use the command-line option -O0 (-[capital o][zero]) to disable optimization, and -S to get assembly file. Look here to see more gcc command-line options.
GCC has a range of optimization levels, plus individual options to enable or disable particular optimizations. The overall compiler optimization level is controlled by the command line option -On, where n is the required optimization level, as follows: -O0 . (default).
The compiler optimizes to reduce the size of the binary instead of execution speed. If you do not specify an optimization option, gcc attempts to reduce the compilation time and to make debugging always yield the result expected from reading the source code.
Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program. The compiler performs optimization based on the knowledge it has of the program.
Reading a variable (that was not declared as volatile
) is not considered to be a "side effect" as specified by the C standard. So the program is free to read a location and then discard the result, as far as the C standard is concerned.
This is very common. Suppose you request 1 byte of data from a 4 byte integer. The compiler may then read the whole 32 bits if that's faster (aligned read), and then discard everything but the requested byte. Your example is similar to this but the compiler decided to read the whole struct.
Formally this is found in the behavior of "the abstract machine", C11 chapter 5.1.2.3. Given that the compiler follows the rules specified there, it is free to do as it pleases. And the only rules listed are regarding volatile
objects and sequencing of instructions. Reading a different struct member in a volatile
struct would not be ok.
As for the case of allocating too little memory for the whole struct, that's undefined behavior. Because the memory layout of the struct is usually not for the programmer to decide - for example the compiler is allowed to add padding at the end. If there's not enough memory allocated, you might end up accessing forbidden memory even though your code only works with the first member of the struct.
No, if *P
is allocated correctly P->B
will never be in unallocated memory. It might not be initialized, that is all.
The compiler has every right to do what they do. The only thing that is not allowed is to oops about the access of P->B
with the excuse that it is not initialized. But what and how they do all of this is under the discretion of the implementation and not your concern.
If you cast a pointer to a block returned by malloc
to Pair*
that is not guaranteed to be wide enough to hold a Pair
the behavior of your program is undefined.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With