Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I simplify code generation at runtime?

I'm working on a piece of software which generates assembler code at runtime. For instance, here's a very simple function which generates assembler code for calling the GetCurrentProcess function (for the Win64 ABI):

void genGetCurrentProcess( char *codePtr, FARPROC addressForGetCurrentProcessFunction )
{
#ifdef _WIN64
  // mov rax, addressForGetCurrentProcessFunction
  *codePtr++ = 0x48
  *codePtr++ = 0xB8;
  *((FARPROC *)codePtr)++ = addressForGetCurrentProcessFunction;

  // call rax
  *codePtr++ = 0xFF;
  *codePtr++ = 0xD0;
#else
  // mov eax, addressForGetCurrentProcessfunction
  *codePtr++ = 0xB8;
  *((FARPROC *)codePtr)++ = addressForGetCurrentProcessFunction;

  // call eax
  *codePtr++ = 0xFF;
  *codePtr++ = 0xD0;
#endif
}

Usually I'd use inline assembler, but alas - this doesn't seem to be possible with the 64bit MSVC compilers anymore. While I'm at it - this code should work with MSVC6 up to MSVC10 and also MinGW. There are many more functions like genGetCurrentProcess, they all emit assembler code and many of them get function pointers to be called passed as arguments.

The annoying thing about this is that modifying this code is error-prone and we've got to take care of ABI-specific things manually (for instance, reserving 32 bytes stack space before calling functions for register spilling).

So my question is - can I simplify this code for generating assembler code at runtime? My hope was that I could somehow write the assembler code directly (possibly in an external file which is then assembled using ml/ml64) but it's not clear to me how this would work if some of the bytes in the assembled code are only known at runtime (the addressForGetcurrentProcessFunction value in the above example, for instance). Maybe it's possible to assemble some code but assign 'labels' to certain locations in the code so that I can easily modify the code at runtime and then copy it into my buffer?

like image 297
Frerich Raabe Avatar asked Mar 06 '12 08:03

Frerich Raabe


People also ask

What is runtime code generation?

Runtime code generation (RTCG) is a kind of program specialization that op- timizes for short-lived invariants by dynamically generating specialized code. In this setting, where invariants may last only seconds or minutes, compila- tion time can easily dominate any benefit.

Which tool is used for automatic code generation?

java-codetool3 is a tool of automatically generate source code for Java language. It is also used to solve many data management work in regular projects.

Why do we use code generation?

Code generators are tools that write code for you. It's a super cool way to solve repetitive code problems that can't be solved by writing an abstraction. This may sound like writing code with extra steps, but we will explore all the benefits of using code generators.


2 Answers

Take a look at asmjit. It is a C++ library for runtime code-generation. Supports x64 and probably most of the existing extensions (FPU, MMX, 3dNow, SSE, SSE2, SSE3, SSE4). Its interface resembles assembly syntax and it encodes the instructions correctly for you.

like image 123
Tamás Szelei Avatar answered Sep 28 '22 09:09

Tamás Szelei


You could depend on a real assembler to do the work for you - one that generates binary output is obviously the best. Consider looking at yasm or fasm (there's some posts on the fasm forums about doing a DLL version, so you don't have to write a temporary assembly file, launch external process, and read output file back, but I dunno if it's been updated for later versions).

This might be overkill if your needs are relatively simple, though. I'd consider doing a C++ Assembler class supporting just the mnemonics you need, along with some helper functions like GeneratePrologue, GenerateEpilogue, InstructionPointerRelativeAddress and such. This would allow you to write pseudo-assembly, and having the helper functions take care of 32/64bit issues.

like image 33
snemarch Avatar answered Sep 28 '22 07:09

snemarch