Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gcc Strange -O0 code generation. Simple malloc. Pointer to multidimensional array

Tags:

c

pointers

gcc

Very simple code:

void *allocateMemory5DArray(size_t x, size_t y, size_t z, size_t q, size_t r)
{
    int (*array)[x][y][z][q][r];

    array = malloc(sizeof(*array));
    return array;
}

The -O0 gcc needs 296bytes of the stack and the generated code is > 180 lines long. Can anyone explain the rationale behind it?

Other compilers (except clang) also generate strange code, but not as strange as gcc :)

https://godbolt.org/z/1zx4YE

like image 302
0___________ Avatar asked Jan 17 '21 14:01

0___________


1 Answers

This behaviour also happens with VLAs and Clang also generates a shorter code than GCC.

Although the generated code by GCC is longer, -O0 has the fastest compilation time (and apparently that's the fastest for GCC), the assembly code is not optimised but we didn't ask for that. When -O1, GCC sacrifices time by optimisation and the generated code is quite similar to clang.


There are differences between Clang and GCC regarding VLAs. The first one doesn't support VLAs in structures, the reasons:

  • is tricky to implement
  • the extension is completely undocumented
  • the extension appears to be rarely used

Clang is happy with C-99 VLAs, but that's all. GCC 4.1 (consider that C99 was "substantially completely supported" with GCC 4.5 ) generates a similar (small) size:

   ...
    mov     %rax, QWORD PTR [%rbp-56]
    mov     %rdx, QWORD PTR [%rbp-48]
    mov     %rcx, QWORD PTR [%rbp-40]
    mov     %rsi, QWORD PTR [%rbp-32]
    mov     %rdi, QWORD PTR [%rbp-24]
    ...

However, with GCC 4.8, the code gets larger. GCC 4.8 file doesn't say anything about changes regarding VLAs which is weird considering the clear differences in the generated code.

Status of C99 features in GCC indicates that there were "Various corner cases fixed in GCC 4.5" related to VLAs. However, 4.5 changelog says nothing. Surprisingly, assembly is slighly different in 4.4 but not in 4.5.

It looks like Clang's reasons regarding VLAs in structs were very accurate and in some cases they may be extended to the whole VLA feature.


This poor behaviour is well-known. Linux's kernel is free of them in the name of performance:

   Buffer allocation |  Encoding throughput (Mbit/s)
 ---------------------------------------------------
  on-stack, VLA      |   3988
  on-stack, fixed    |   4494
  kmalloc            |   1967

which is also good news for CLang builders.

like image 96
Jose Avatar answered Oct 19 '22 01:10

Jose