Why does gcc fill the whole array with zeros instead of only the remaining 96 integers? The non-zero initializers are all at the start of the array.
void *sink;
void bar() {
int a[100]{1,2,3,4};
sink = a; // a escapes the function
asm("":::"memory"); // and compiler memory barrier
// forces the compiler to materialize a[] in memory instead of optimizing away
}
MinGW8.1 and gcc9.2 both make asm like this (Godbolt compiler explorer).
# gcc9.2 -O3 -m32 -mno-sse
bar():
push edi # save call-preserved EDI which rep stos uses
xor eax, eax # eax=0
mov ecx, 100 # repeat-count = 100
sub esp, 400 # reserve 400 bytes on the stack
mov edi, esp # dst for rep stos
mov DWORD PTR sink, esp # sink = a
rep stosd # memset(a, 0, 400)
mov DWORD PTR [esp], 1 # then store the non-zero initializers
mov DWORD PTR [esp+4], 2 # over the zeroed part of the array
mov DWORD PTR [esp+8], 3
mov DWORD PTR [esp+12], 4
# memory barrier empty asm statement is here.
add esp, 400 # cleanup the stack
pop edi # and restore caller's EDI
ret
(with SSE enabled it would copy all 4 initializers with movdqa load/store)
Why doesn't GCC do lea edi, [esp+16]
and memset (with rep stosd
) only the last 96 elements, like Clang does? Is this a missed optimization, or is it somehow more efficient to do it this way? (Clang actually calls memset
instead of inlining rep stos
)
Editor's note: the question originally had un-optimized compiler output which worked the same way, but inefficient code at -O0
doesn't prove anything. But it turns out that this optimization is missed by GCC even at -O3
.
Passing a pointer to a
to a non-inline function would be another way to force the compiler to materialize a[]
, but in 32-bit code that leads to significant clutter of the asm. (Stack args result in pushes, which gets mixed in with stores to the stack to init the array.)
Using volatile a[100]{1,2,3,4}
gets GCC to create and then copy the array, which is insane. Normally volatile
is good for looking at how compilers init local variables or lay them out on the stack.
Using Initializer List. int arr[] = { 1, 1, 1, 1, 1 }; The array will be initialized to 0 if we provide the empty initializer list or just specify 0 in the initializer list.
int num[5] = {1, 1, 1, 1, 1}; This will initialize the num array with value 1 at all index. The array will be initialized to 0 in case we provide empty initializer list or just specify 0 in the initializer list. Designated Initializer: This initializer is used when we want to initialize a range with the same value.
In theory your initialization could look like that:
int a[100] = {
[3] = 1,
[5] = 42,
[88] = 1,
};
so it may be more effective in sense of cache and optimizablity to first zero out the whole memory block and then set individual values.
May be the behavior changes depending on:
Of course, in your case the initialization are compacted at the start of the array and the optimization would be trivial.
So it seems that gcc is doing the most generic approach here. Looks like a missing optimization.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With