I have a really odd situation with dynamic symbol binding on OS X that I'm hoping to get some clues on how to resolve.
I have an application, written in C, which uses dlopen()
to dynamically load modules at runtime. Some of these modules export global symbols, which may be used by other modules loaded later.
We have one module (which I'll call weird_module.so
) which exports global symbols, one of which is weird_module_function
. If weird_module.so gets linked with a particular library (which I'll call libsomething.dylib
), then weird_module_function
can't be bound to. But if I remove the -lsomething
when linking weird_module.so
, then I can bind to weird_module_function
.
What could possibly be going on with libsomething.dylib
that would cause weird_module.so
to not export symbols? Are there things I can do to debug how symbols get exported (similar to how I can use DYLD_PRINT_BINDINGS
to debug how they get bound)?
$ LDFLAGS="-bundle -mmacosx-version-min=10.6 -Xlinker -undefined -Xlinker dynamic_lookup /usr/lib/bundle1.o"
$ gcc -o weird_module.so ${LDFLAGS} weird_module.o -lsomething
$ nm weird_module.so | grep '_weird_module_function$'
00000000000026d0 T _weird_module_function
$ gcc -o other_module.so ${LDFLAGS} other_module.o -lsomething
$ nm other_module.so | grep '_weird_module_function$'
U _weird_module_function
$ run-app
Loading weird_module.so
Loading other_module.so
dyld: lazy symbol binding failed: Symbol not found: _weird_module_function
Referenced from: other_module.so
Expected in: flat namespace
dyld: Symbol not found: _weird_module_function
Referenced from: other_module.so
Expected in: flat namespace
# Now relink without -lsomething
$ gcc -o weird_module.so ${LDFLAGS} weird_module.o
$ nm weird_module.so | grep '_weird_module_function$'
00000000000026d0 T _weird_module_function
$ run-app
Loading weird_module.so
Loading other_module.so
# No error!
Edit:
I tried putting together a minimal app to duplicate the problem, and in the course of doing so at least figured it out one thing we were doing wrong. There are two other pertinent facts relevant to duplicating the issue.
First is that run-app
preloads the module with RTLD_LAZY | RTLD_LOCAL
to inspect its metadata. The module is then dlclose()
ed and reopened with either RTLD_LAZY | RTLD_GLOBAL
or RTLD_NOW | RTLD_LOCAL
, depending on the metadata. (For both modules in question, it reopens with RTLD_LAZY | RTLD_GLOBAL
).
Secondly, there turns out to be a symbol collision in weird_module.so
and libsomething.dylib
for a const
global.
$ nm weird_module.so | grep '_something_global`
00000000000158f0 S _something_global
$ nm libsomething.dylib | grep '_something_global'
0000000000031130 S _something_global
I'm willing to consider that the duplicate symbol would put me in the realm of undefined behavior, so I'm dropping the question.
I tried to reproduce your scenario and I was able to get the same errors as you, i.e. dyld: lazy symbol binding failed
followed by dyld: Symbol not found
.
But it had nothing to do with linking against libsomething.dylib
or not. What I did to trigger this error was just calling weird_module_function()
from the constructor of other_module.so
:
// other_module.c
#import <stdio.h>
#import "weird_module.h"
__attribute__((constructor)) void initialize_other_module(void)
{
printf("%s\n", __PRETTY_FUNCTION__);
weird_module_function();
}
Here is how I loaded the modules:
// main.c
#import <stdio.h>
#import <dlfcn.h>
int main(int argc, const char * argv[])
{
printf("\nLoading weird module\n");
void *weird = dlopen("weird_module.so", RTLD_LAZY | RTLD_LOCAL);
printf("weird: %p\n\n", weird);
printf("Loading other module\n");
void *other = dlopen("other_module.so", RTLD_LAZY | RTLD_LOCAL);
printf("other: %p\n", other);
return 0;
}
The dyld errors disappear if I remove the RTLD_LOCAL
option when loading weird_module.so
.
The same error also occurs if you call weird_module_function
from a libsomething.dylib
constructor but it happens before main
is called so that’s probably not what is happening to you.
But maybe the libsomething.dylib
constructor is where you should look to find how libsomething.dylib
is influencing your modules loading process. You can set the DYLD_PRINT_INITIALIZERS
environment variable to YES
in order to find out what constructors are called.
A few other things to check:
RTLD_LAZY | RTLD_GLOBAL
? The only way I could get the dyld errors was by passing the RTLD_LOCAL
option.dlclose
call is successful (returns 0)? If, for example, your module contains Objective-C code, it will not be unloaded.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With