We have a message processing system with high performance demands. Recently we have noticed that the first message takes many times longer then subsequent messages. A bunch of transformation and message augmentation happens as this goes through our system, much of it done by way of external lib.
I just profiled this issue (using callgrind), comparing a "run" of just one message with a "run" of many messages (providing a baseline of comparison).
The main difference I see is the function "do_lookup_x" taking up a huge amount of time. Looking at the various calls to this function, they all seem to be called by the common function: _dl_runtime_resolve. Not sure what this function does, but to me this looks like the first time the various shared libraries are being used, and are then being loaded in to memory by the ld.
Is this a correct assumption? That the binary will not load the shared libraries in to memory until they are being prepped for use, therefore we will see a massive slowdown on the first message, but on none of the subsequent?
How do we go about avoiding this?
Note: We operate on the microsecond scale.
From the ld.so(8)
man page, ENVIRONMENT section:
LD_BIND_NOW
(libc5; glibc since 2.1.1) If set to a non-empty string, causes
the dynamic linker to resolve all symbols at program startup
instead of deferring function call resolution to the point when
they are first referenced. This is useful when using a debug-
ger.
So, LD_BIND_NOW=y ./timesensitiveapp
.
As an alternative to Ignacio Vazquez-Abrams's runtime suggestion, you can do the same thing at link time. When you link your shared library, pass the -z now
flag to the linker.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With