Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does GCC aggregate initialization of an array fill the whole thing with zeros first, including non-zero elements?

Tags:

Why does gcc fill the whole array with zeros instead of only the remaining 96 integers? The non-zero initializers are all at the start of the array.

void *sink;
void bar() {
    int a[100]{1,2,3,4};
    sink = a;             // a escapes the function
    asm("":::"memory");   // and compiler memory barrier
    // forces the compiler to materialize a[] in memory instead of optimizing away
}

MinGW8.1 and gcc9.2 both make asm like this (Godbolt compiler explorer).

# gcc9.2 -O3 -m32 -mno-sse
bar():
    push    edi                       # save call-preserved EDI which rep stos uses
    xor     eax, eax                  # eax=0
    mov     ecx, 100                  # repeat-count = 100
    sub     esp, 400                  # reserve 400 bytes on the stack
    mov     edi, esp                  # dst for rep stos
        mov     DWORD PTR sink, esp       # sink = a
    rep stosd                         # memset(a, 0, 400) 

    mov     DWORD PTR [esp], 1        # then store the non-zero initializers
    mov     DWORD PTR [esp+4], 2      # over the zeroed part of the array
    mov     DWORD PTR [esp+8], 3
    mov     DWORD PTR [esp+12], 4
 # memory barrier empty asm statement is here.

    add     esp, 400                  # cleanup the stack
    pop     edi                       # and restore caller's EDI
    ret

(with SSE enabled it would copy all 4 initializers with movdqa load/store)

Why doesn't GCC do lea edi, [esp+16] and memset (with rep stosd) only the last 96 elements, like Clang does? Is this a missed optimization, or is it somehow more efficient to do it this way? (Clang actually calls memset instead of inlining rep stos)


Editor's note: the question originally had un-optimized compiler output which worked the same way, but inefficient code at -O0 doesn't prove anything. But it turns out that this optimization is missed by GCC even at -O3.

Passing a pointer to a to a non-inline function would be another way to force the compiler to materialize a[], but in 32-bit code that leads to significant clutter of the asm. (Stack args result in pushes, which gets mixed in with stores to the stack to init the array.)

Using volatile a[100]{1,2,3,4} gets GCC to create and then copy the array, which is insane. Normally volatile is good for looking at how compilers init local variables or lay them out on the stack.

like image 952
Lassie Avatar asked Nov 24 '19 20:11

Lassie


People also ask

How do you initialize an array with 0?

Using Initializer List. int arr[] = { 1, 1, 1, 1, 1 }; The array will be initialized to 0 if we provide the empty initializer list or just specify 0 in the initializer list.

How do you initialize an entire array with value?

int num[5] = {1, 1, 1, 1, 1}; This will initialize the num array with value 1 at all index. The array will be initialized to 0 in case we provide empty initializer list or just specify 0 in the initializer list. Designated Initializer: This initializer is used when we want to initialize a range with the same value.


1 Answers

In theory your initialization could look like that:

int a[100] = {
  [3] = 1,
  [5] = 42,
  [88] = 1,
};

so it may be more effective in sense of cache and optimizablity to first zero out the whole memory block and then set individual values.

May be the behavior changes depending on:

  • target architecture
  • target OS
  • array length
  • initialization ratio (explicitly initialized values/length)
  • positions of the initialized values

Of course, in your case the initialization are compacted at the start of the array and the optimization would be trivial.

So it seems that gcc is doing the most generic approach here. Looks like a missing optimization.

like image 143
vlad_tepesch Avatar answered Sep 17 '22 17:09

vlad_tepesch