Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC behavior for unresolved weak functions

Tags:

c

gcc

ld

arm

weak

Consider the simple program below:

__attribute__((weak)) void weakf(void);

int main(int argc, char *argv[])
{
        weakf();
}

When compiling this with gcc and running it on a Linux PC, it segfaults. When running it on ARM CM0 (arm-none-eabi-gcc), the linker replace the undefined symbol by a jump to the following instruction and a nop.

Where is this behavior documented? Is there possible ways to change it through command line options? I have been through GCC and LD documentations, there is no information about that.

If I check the ARM compiler doc however, this is clearly explained.

like image 891
calandoa Avatar asked Jul 03 '15 09:07

calandoa


People also ask

What is __ Attribute__ weak ))?

__attribute__((weak)) variable attributeGenerates a weak symbol for a variable, rather than the default symbol. extern int foo __attribute__((weak)); At link time, strong symbols override weak symbols.

What is __ weak in C?

__weak function are methods that can be overwritten by user function with same name, used to define vector tables, and default handlers. Normal function writing (declaration and definition) are considered strong meaning that the function name cannot be re declared, you will get compiler/linker error.

What is Pragma weak?

#pragma WEAK The WEAK pragma makes symbol a weak reference if it is a reference, or a weak definition, if it is a definition. The symbol can be a data or function variable. In effect, unresolved weak references do not cause linker errors and do not have any effect at run time.

What is strong and weak symbols in C?

For c program, if you define an global variable and not initialize it, GCC will regard it as weak symbol. However, for c++ program, the default type is strong variable. That is to say, for line int gvar; in main. cpp , it is a strong symbol. Since we have another strong symbol with the same name in aux.


1 Answers

man nm

I was reading some docs and happened to come across a related quote for this:

man nm

says:

"V"
"v" The symbol is a weak object. When a weak defined symbol is linked with a normal defined symbol, the normal defined symbol is used with no error. When a weak undefined symbol is linked and the symbol is not defined, the value of the weak symbol becomes zero with no error. On some systems, uppercase indicates that a default value has been specified.

"W"
"w" The symbol is a weak symbol that has not been specifically tagged as a weak object symbol. When a weak defined symbol is linked with a normal defined symbol, the normal defined symbol is used with no error. When a weak undefined symbol is linked and the symbol is not defined, the value of the symbol is determined in a system-specific manner without error. On some systems, uppercase indicates that a default value has been specified.

nm is part of Binutils, which GCC uses under the hood, so this should be canonical enough.

Then, example on your source file:

main.c

__attribute__((weak)) void weakf(void);

int main(int argc, char *argv[])
{
        weakf();
}

we do:

gcc -O0 -ggdb3 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
nm main.out

which contains:

w weakf

and so it is a system-specific value. I can't find where the per-system behavior is defined however. I don't think you can do better than reading Binutils source here.

v would be fixed to 0, but that is used for undefined variables (which are objects): How to make weak linking work with GCC?

Then:

gdb -batch -ex 'disassemble/rs main' main.out

gives:

Dump of assembler code for function main:
main.c:
4       {
   0x0000000000001135 <+0>:     55      push   %rbp
   0x0000000000001136 <+1>:     48 89 e5        mov    %rsp,%rbp
   0x0000000000001139 <+4>:     48 83 ec 10     sub    $0x10,%rsp
   0x000000000000113d <+8>:     89 7d fc        mov    %edi,-0x4(%rbp)
   0x0000000000001140 <+11>:    48 89 75 f0     mov    %rsi,-0x10(%rbp)

5               weakf();
   0x0000000000001144 <+15>:    e8 e7 fe ff ff  callq  0x1030 <weakf@plt>
   0x0000000000001149 <+20>:    b8 00 00 00 00  mov    $0x0,%eax

6       }
   0x000000000000114e <+25>:    c9      leaveq 
   0x000000000000114f <+26>:    c3      retq   
End of assembler dump.

which means it gets resolved at the PLT.

Then since I don't fully understand PLT, I experimentally verify that it resolves to address 0 and segfaults:

gdb -nh -ex run -ex bt main.out

I'm supposing the same happens on ARM, it must just set it to 0 as well.