I have a elf binary which has been statically linked to libc. I do not have access to its C code. I would like to use OpenOnload library, which has implementation of sockets in user-space and therefore provides lower latency compared to standard libc versions. OpenOnload implements standard socket api, and overrides libc version using LD_PRELOAD. But, since this elf binary is statically linked, it cannot use the OpenOnload version of the socket API.
I believe that converting this binary to dynamically link with OpenOnload is possible, with the following steps:
As a first cut, I tried just adding 3 PT_LOAD segments. New segment headers were added after existing PT_LOAD segment headers. Also, vm_addr of existing segments was not modified. File offsets of existing segments were shifted below to next aligned address based on p_align. New PT_LOAD segments were added in the file at end of the file.
After re-writing the file, when I ran it, it was loaded properly by the kernel, but then it immediately seg-faulted.
My questions are:
What you are attempting is not possible in any automated way. At the time of static linking, all relocation information identifying calls to libc as calls to libc has been resolved and removed. If debugging symbols exist in the binary, it's possible to identify "this range of bytes in the text segment corresponds to such-and-such libc function", but there is no way to identify references to the function, which will be embedded in the instruction byte stream with no markup to identify them. You could use heuristics based on disassembly, but they would be incomplete and unreliable (possibility of both false negatives and false positives).
As far as shifting offsets, you absolutely cannot change anything about the load addresses for a static linked binary. If you need to insert headers before the load segments, you'd have to insert a whole page, and update the file offsets in the program header table (adding 1 page to them) while leaving the virtual address load offsets the same. However, since what you're trying to do is not possible overall, the offset-shifting issue is the least of your worries.
Perhaps, if the program doesn't require high performance, you could run it under qemu app-level emulation, with qemu going through the sockets emulation/wrapper.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With