Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linking with static library not equivalent to linking with its objects

Problem:

The firmware image generated when linking with a static library is different to the firmware image generated when linking with the objects directly extracted from the static library.

Both firmware images link without error and load successfully onto the microcontroller.

The latter binary (linked with objects) executes successfully and as expected, while the former (linked to the static library) does not.

The only warnings during compilation are unused-but-set-variable in the manufacturer-supplied HAL, which due to various macro definitions are not necessary for the compiled implementation; and unused-parameter in various weak functions, also within the manufacturer-supplied HAL.

Description:

I am developing an embedded application for the STM32F407. Until now I have been working with one code base including the microprocessor's HAL & setup code, a driver for a specific peripheral, and an application utilizing the former two.

Since I wish to develop multiple applications using the same driver & HAL (both are complete and tested, so won't change often), I wish to compile & distribute the HAL and driver as a static library, which can then be linked with the application source.

The problem is that when linking the application and static library, the firmware image does not execute correctly on the microprocessor. When linking the application and the object files directly extracted from the static library, the firmware image executes as expected.

Specifically:

Created binary does not work when linking with static library using:

$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(APPOBJECTS) Library/libtest.a

Created binary works when linking with objects extracted from static library using:

@cd Library && $(AR) x libtest.a && cd ..
$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(APPOBJECTS) Library/*.o

In both cases:

CFLAGS = $(INCLUDES) $(DEFS) -ggdb3 -O0 -std=c99 -Wall -specs=nano.specs -nodefaultlibs
CFLAGS+= -fdata-sections -ffunction-sections -mcpu=cortex-m4 -march=armv7e-m -mthumb
CFLAGS+= -mfloat-abi=hard -mfpu=fpv4-sp-d16 -MD -MP -MF [email protected]

LDFLAGS = -T$(LDSCRIPT) -Wl,-static -Wl,-Map=$(@:.elf=.map),--cref -Wl,--gc-sections

I have compared the outputs of -Wl,--print-gc-sections as well as the app.map file, but enough is different between the two builds that no one thing jumps out as being wrong. I have also tried without -Wl,--gc-sections, to no avail.

The output of arm-none-eabi-size of the two firmware images is:

 text      data     bss     dec     hex filename
43464        76    8568   52108    cb8c workingapp.elf

 text      data     bss     dec     hex filename
17716        44    8568   26328    66d8 brokenapp.elf

A similar size discrepancy can be seen when compiling without -Wl,--gc-sections

Using arm-none-eabi-gdb to debug the microcontroller's execution, the faulty firmware image enters an infinite loop when the WWDG interrupt occurs. This interrupt is not enabled in the firmware and thus the interrupt handler defaults to the Default_Handler (an infinite loop). This interrupt does not occur when running the working firmware image.

The WWDG interrupt occurring is actually a red herring, as described in the accepted answer

--Mike

like image 452
Mike Hamer Avatar asked Jan 08 '23 08:01

Mike Hamer


2 Answers

Summary:

The issue was that not all objects from the static library were being included in the firmware image. This is solved by surrounding the static library with the --whole-archive and --no-whole-archive linker flags:

 $(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(APPOBJECTS) -Wl,--whole-archive Library/libtest.a -Wl,--no-whole-archive

The issue arises because if the linker includes a library object with weak symbol definitions, it considers these symbols defined, and no longer searches for their (strong) definitions. Hence the object with strong definitions may or may not be included, depending on search order and what other symbols it defines.

Solution path:

Using arm-none-eabi-gdb to debug, it appeared that the disabled WWDG interrupt was occurring and calling the Default_Handler. This turned out to be a red herring... which has occured often enough that it led me to the answer via the "STM32 WWDG interrupt firing when not configured" stackoverflow post.

Upon reading this post and learning that the gdb function name reporting is often inaccurate for functions that share the same memory address, I checked the generated .map file for the faulty firmware image and confirmed that the WWDG_IRQHandler was located at the same memory address as the majority of IRQHandlers including the IRQHandlers for interrupts that are defined and used by the system (eg. some timer interrupts).

Furthermore, all interrupts defined in the stm32f4xx_it.o object (which defines the IRQHandlers for interrupts used by the system, and which is included in the static library) pointed to the memory address of the Default_Handler, and the respective IRQHandler symbols were listed as being supplied by startup_stm32f407xx.o.

I then checked which object files were actually linked into the firmware image (perl -n -e '/libtest\.a\((.*?)\)/ && print "$1\n"' app.map | sort -u) and found that only a subset of objects were linked.

Further inspection of startup_stm32f407xx.s showed that it defines many weak symbols, eg:

.weak TIM2_IRQHandler

During the process of linking a static library, the linker searches the library for undefined symbols and includes the first object it finds to define these symbols. It then removes the symbol from the undefined list, as well as any other undefined symbols that are defined by the included object.

My guess as to what happened is that the linker found an otherwise-undefined symbol in startup_stm32f407xx.o and included the object. It considered all IRQHandler symbols to be defined by the weak definitions therein. The object stm32f4xx_it.o was never included since it did not define any undefined symbols. This happened a number of times, with a number of different object files; sometimes the strong symbols were included, sometimes the weak symbols were included, depending on which object was searched first. Interesting (yet unsurprising) is that if the weak definition is removed, the object containing the strong definition is included, and all strong definitions from that file (correctly) override the already-included weak definitions.

Having solved the problem, I'm not sure where to go from here. Is this a linker bug?

like image 145
Mike Hamer Avatar answered Jan 31 '23 08:01

Mike Hamer


You'll get a better answer if you can explain what "the binary doesn't work" really means.

Are you getting a binary that your programming tools won't load into the chip at all?

If so, look carefully at linker output on the command line.

Are you producing something you can load into the chip and not seeing the expected behavior?

If so, use a hardware debugger. Step through the code until something breaks, or let it run, then halt it and see where you ended up.

Chances are, you're just uncovering a bug that's always been in the code by rearranging where everything goes in memory. Array overflows, bad pointer dereferences, and uninitialized variables are typical culprits. Switching on -Wextra and -Wall can help uncover this stuff.

One other thought: Make sure you're LDSCRIPT has the correct flash & RAM sizes for the actual part number (i.e. is not for a different part in the family).

like image 23
Brian McFarland Avatar answered Jan 31 '23 06:01

Brian McFarland