Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CPU dependent code: how to avoid function pointers?

Tags:

c

cpu

ld

arm

elf

I have performance critical code written for multiple CPUs. I detect CPU at run-time and based on that I use appropriate function for the detected CPU. So, now I have to use function pointers and call functions using these function pointers:

void do_something_neon(void);
void do_something_armv6(void);

void (*do_something)(void);

if(cpu == NEON) {
    do_something = do_something_neon;
}else{
    do_something = do_something_armv6;
}

//Use function pointer:
do_something(); 
...

Not that it matters, but I'll mention that I have optimized functions for different cpu's: armv6 and armv7 with NEON support. The problem is that by using function pointers in many places the code become slower and I'd like to avoid that problem.

Basically, at load time linker resolves relocs and patches code with function addresses. Is there a way to control better that behavior?

Personally, I'd propose two different ways to avoid function pointers: create two separate .so (or .dll) for cpu dependent functions, place them in different folders and based on detected CPU add one of these folders to the search path (or LD_LIB_PATH). The, load main code and dynamic linker will pick up required dll from the search path. The other way is to compile two separate copies of library :) The drawback of the first method is that it forces me to have at least 3 shared objects (dll's): two for the cpu dependent functions and one for the main code that uses them. I need 3 because I have to be able to do CPU detection before loading code that uses these cpu dependent functions. The good part about the first method is that the app won't need to load multiple copies of the same code for multiple CPUs, it will load only the copy that will be used. The drawback of the second method is quite obvious, no need to talk about it.

I'd like to know if there is a way to do that without using shared objects and manually loading them at runtime. One of the ways would be some hackery that involves patching code at run-time, it's probably too complicated to get it done properly). Is there a better way to control relocations at load time? Maybe place cpu dependent functions in different sections and then somehow specify what section has priority? I think MAC's macho format has something like that.

ELF-only (for arm target) solution is enough for me, I don't really care for PE (dll's).

thanks

like image 201
Pavel P Avatar asked Dec 12 '22 04:12

Pavel P


2 Answers

You may want to lookup the GNU dynamic linker extension STT_GNU_IFUNC. From Drepper's blog when it was added:

Therefore I’ve designed an ELF extension which allows to make the decision about which implementation to use once per process run. It is implemented using a new ELF symbol type (STT_GNU_IFUNC). Whenever the a symbol lookup resolves to a symbol with this type the dynamic linker does not immediately return the found value. Instead it is interpreting the value as a function pointer to a function that takes no argument and returns the real function pointer to use. The code called can be under control of the implementer and can choose, based on whatever information the implementer wants to use, which of the two or more implementations to use.

Source: http://udrepper.livejournal.com/20948.html

Nonetheless, as others have said, I think you're mistaken about the performance impact of indirect calls. All code in shared libraries will be called via a (hidden) function pointer in the GOT and a PLT entry that loads/calls that function pointer.

like image 192
R.. GitHub STOP HELPING ICE Avatar answered Dec 31 '22 01:12

R.. GitHub STOP HELPING ICE


For the best performance you need to minimize the number of indirect calls (through pointers) per second and allow the compiler to optimize your code better (DLLs hamper this because there must be a clear boundary between a DLL and the main executable and there's no optimization across this boundary).

I'd suggest doing these:

  1. moving as much of the main executable's code that frequently calls DLL functions into the DLL. That'll minimize the number of indirect calls per second and allow for better optimization at compile time too.
  2. moving almost all your code into separate CPU-specific DLLs and leaving to main() only the job of loading the proper DLL OR making CPU-specific executables w/o DLLs.
like image 36
Alexey Frunze Avatar answered Dec 31 '22 01:12

Alexey Frunze