Why clang does not optimize global const like a #define?

I have this test program, using a #define constant:

#include <stdio.h>

#define FOO 1

int main()
    printf("%d\n", FOO);

    return 0;

When compiled with “Apple LLVM version 10.0.0 (clang-1000.11.45.5)”, I get an executable of 8432 bytes. Here is the assembly listing:

    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 10, 14
    .globl  _main                   ## -- Begin function main
    .p2align    4, 0x90
_main:                                  ## @main
## %bb.0:
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register %rbp
    subq    $16, %rsp
    leaq    L_.str(%rip), %rdi
    movl    $1, %esi
    movl    $0, -4(%rbp)
    movb    $0, %al
    callq   _printf
    xorl    %esi, %esi
    movl    %eax, -8(%rbp)          ## 4-byte Spill
    movl    %esi, %eax
    addq    $16, %rsp
    popq    %rbp
                                        ## -- End function
    .section    __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
    .asciz  "%d\n"


Now I replace #define FOO 1 with const int FOO = 1;. The executable is now 8464 bytes and the assembly listing looks like this:

.section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 10, 14
    .globl  _main                   ## -- Begin function main
    .p2align    4, 0x90
_main:                                  ## @main
## %bb.0:
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register %rbp
    subq    $16, %rsp
    leaq    L_.str(%rip), %rdi
    movl    $1, %esi
    movl    $0, -4(%rbp)
    movb    $0, %al
    callq   _printf
    xorl    %esi, %esi
    movl    %eax, -8(%rbp)          ## 4-byte Spill
    movl    %esi, %eax
    addq    $16, %rsp
    popq    %rbp
                                        ## -- End function
    .section    __TEXT,__const
    .globl  _FOO                    ## @FOO
    .p2align    2
    .long   1                       ## 0x1

    .section    __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
    .asciz  "%d\n"


So it actually declared a FOO variable, making the executable 32 bytes bigger. I get the same result with -O3 optimization level.

Why is that? Normally, the compiler should be intelligent enough to optimize and add the constant to the symbol table instead of taking up storage for it.

2 Answers

This is another case where the difference between C and C++ matters.

In C, const int FOO has external linkage and must thus be included in the binary.

Compiling with g++ or clang++ instead gives you the desired optimization as FOO has internal linkage in C++.

You can achieve the optimization in C mode by explicitly requesting internal linkage for FOO via

static const int FOO = 1;

Both clang and gcc with link-time optimization enabled (-flto) also manage to strip away the unused symbol, even when linkage is external. (Live with and without LTO.)

The fact that you use the variable FOO in your second program means that it has to live somewhere, so the compiler needs to allocate it somewhere.

In the #define case, there is no variable - the pre-processor substituted the text "FOO" with the text "1" an so the call to printf() was passed a constant value, not a variable.

