Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to call an assembly function from C++ dynamically?

REQUIREMENT: For a certain project we have unique requirement. The application supports an expression language that allows the user to define their own complex expressions that can be evaluated at run time (many hundred times a second) and they need to be executed at machine level for performance.

WORKING: Our expression parser translates the script into corresponding assembly language routine perfectly. We checked it by statically linking the object files generated with our C test program and they produce correct result. Since the client can change the script anytime, our program (at run time) detects the change, calls the parser which generates the corresponding assembly routine. We then call the assembler from back end to create the object code.

PROBLEM

How can we call this assembly routine dynamically from the C++ program (Loader)?

We are not supposed to call the C++ compiler to link it with the loader because the loader already would have other subroutines running and we cannot take the loader off, recompile and then execute the new loader program.

I tried searching for a solution online but every time the results are littered with .NET assembly dynamic calling. Our app has nothing to do with .NET.

like image 723
Simple Fellow Avatar asked Dec 14 '22 12:12

Simple Fellow


1 Answers

First, the "generated plugin" approach (on Linux; my answer focuses on Linux but could be adapted to Windows with some effort; you could use many-platform frameworks like Qt or POCO or Glib from GTK; then all wrap plugin loading abilities à la dlopen with a common API that you could use on Windows, on Linux, on MacOSX, on Android) :

  • generate C (or assembly) code in some file /tmp/generated01.c (you might even generate C++ code using standard C++ containers, but its compilation would be significantly slower; beware of name mangling so emit and use extern "C" functions; read the C++ dlopen mini HowTo). See this answer explaining why generating C is worthwhile (and could be better, and more portable, than generating assembler code).
  • run (using fork+execve+waitpid, or simply system) a compilation of that generated file into a shared object /tmp/genenerated01.so by running gcc -fPIC -Wall -O /tmp/generated01.c -shared -o /tmp/generated01.so command; you practically need to get position-independent code, hence the -fPIC flag. If using dlopen on your generated assembler code you'll need to improve your assembler generator to emit PIC code.
  • dlopen that new /tmp/generated01.so (so use the dynamic linker), see dlopen(3); you could even remove the now useless generated C file /tmp/generated01.c
  • dlsym the relevant symbols to get function pointers to the generated code, see dlsym(3); your application would simply call the generated code using these function pointers.
  • when you are sure that you don't need any functions from it and that no call frame uses it, you could dlclose that shared object library (but you might accept to leak some address space by not calling dlclose at all)

The above approach is worthwhile and can be used a big lot of times (my manydl.c demonstrates that you could dlopen a million different shared objects), and is practically even compatible (even when emitting C code!) with an interactive Read-Eval-Print-Loop -on most current desktops and laptops and servers-, since most of the time the generated /tmp/generated01.c would be quite small (e.g. a few hundred lines at most) to be very quickly generated and compiled (by gcc, etc...). I am even using this in MELT for its REPL mode. On Linux this plugin approach generally requires to link the main application with -rdynamic (so that dlopen-ed plugins can reference and call functions from the main application).


Then, other approaches could be to use some Just-In-Time compilation library, like

  • GNU lightning (which emits slow machine code very quickly - so very short JIT emission time, but the generated code is running slowly since it is very unoptimized)
  • asmjit; it is x86-64 specific, and enables you to generate individual x86-64 machine instructions
  • GNU libjit is available for several platforms, and offer an "interpreter" mode for other platforms
  • LLVM (part of Clang/LLVM compiler, usable as a JIT library)
  • GCCJIT (a new JIT library front-end to GCC)

Grossly speaking, the first elements of that list are able to emit JIT machine code fairly quickly, but that code won't run as fast as compiling with gcc -fPIC -O1 or -O2 the equivalent generated C code (but would run typically 2x to 5x slower!); the last two elements (LLVM & GCCJIT) are compiler based: so they are able to optimize and emit efficient code, at the expense of slower JIT code emission. All the JIT libraries are able (like dlsym does for plugins) to give function pointers to newly JIT-constructed functions.

Notice that there is a trade-off to be made: some techniques are able to generate quickly some machine code, if you accept that generated code to later run a bit slowly; other techniques (notably GCCJIT or LLVM) are spending time to optimize the generated machine code, so takes more time to emit the machine code, but that code would later run quickly. You should not expect both (small generation time, quick execution time), since there is no such thing as a free lunch.


I believe that generating manually some assembler code is practically not worthwhile. You won't be able to generate very optimized code (because optimization is a very difficult art, and both GCC and Clang have millions of source line code for optimization passes), unless you spend many years of work for that. Using some JIT library is easier, and "compiling" to C or C++ is also quite easy (you leave the burden of optimization to the C compiler you are calling).


You could also consider rewriting your application into some language with homoiconicity and metaprogramming abilities (e.g. multi-stage programming), such as Common Lisp (and many others, e.g. those providing eval). Its SBCL implementation is always emitting machine code...

You could also embed an interpreter like Lua -perhaps even LuaJit- or Guile in your application. The main advantage of embedding an existing language is that there are resources (books, modules, ...) and community of people knowing them (designing a good language is difficult!). Also, the embedded interpreter library is well designed and probably well debugged (since used a lot), and some of them are fast enough (since using bytecode techniques).

like image 92
Basile Starynkevitch Avatar answered Jan 04 '23 22:01

Basile Starynkevitch