Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to offload memory offset calculation from runtime in C/C++?

I am implementing a simple VM, and currently I am using runtime arithmetic to calculate individual program object addresses as offsets from base pointers.

I asked a couple of questions on the subject today, but I seem to be going slowly nowhere.

I learned a couple of things thou, from question one - Object and struct member access and address offset calculation - I learned that modern processors have virtual addressing capabilities, allowing to calculate memory offsets without any additional cycles devoted to arithmetic.

And from question two - Are address offsets resolved during compile time in C/C++? - I learned that there is no guarantee for this happening when doing the offsets manually.

By now it should be clear that what I want to achieve is to take advantage of the virtual memory addressing features of the hardware and offload those from the runtime.

I am using GCC, as for platform - I am developing on x86 in windows, but since it is a VM I'd like to have it efficiently running on all platforms supported by GCC.

So ANY information on the subject is welcome and will be very appreciated.

Thanks in advance!

EDIT: Some overview on my program code generation - during the design stage the program is build as a tree hierarchy, which is then recursively serialized into one continuous memory block, along with indexing objects and calculating their offset from the beginning of the program memory block.

EDIT 2: Here is some pseudo code of the VM:

switch *instruction
   case 1: call_fn1(*(instruction+1)); instruction += (1+sizeof(parameter1)); break;
   case 2: call_fn2(*(instruction+1), *(instruction+1+sizeof(parameter1));
           instruction += (1+sizeof(parameter1)+sizeof(parameter2); break;
   case 3: instruction += *(instruction+1); break;  

Case 1 is a function that takes one parameter, which is found immediately after the instruction, so it is passed as an offset of 1 byte from the instruction. The instruction pointer is incremented by 1 + the size of the first parameter to find the next instruction.

Case 2 is a function that takes two parameters, same as before, first parameter passed as 1 byte offset, second parameter passed as offset of 1 byte plus the size of the first parameter. The instruction pointer is then incremented by the size of the instruction plus sizes of both parameters.

Case 3 is a goto statement, the instruction pointer is incremented by an offset which immediately follows the goto instruction.

EDIT 3: To my understanding, the OS will provide each process with its own dedicated virtual memory addressing space. If so, does this mean the first address is always ... well zero, so the offset from the first byte of the memory block is actually the very address of this element? If memory address is dedicated to every process, and I know the offset of my program memory block AND the offset of every program object from the first byte of the memory block, then are the object addresses resolved during compile time?

Problem is those offsets are not available during the compilation of the C code, they become known during the "compilation" phase and translation to bytecode. Does this mean there is no way to do object memory address calculation for "free"?

How is this done in Java for example, where only the virtual machine is compiled to machine code, does this mean the calculation of object addresses takes a performance penalty because of runtime arithmetics?

like image 775
dtech Avatar asked Jul 15 '12 18:07

dtech


2 Answers

Here's an attempt to shed some light on how the linked questions and answers apply to this situation.

The answer to the first question mixes two different things, the first is the addressing modes in X86 instruction and the second is virtual-to-physical address mapping. The first is something that is done by compilers and the second is something that is (typically) set up by the operating system. In your case you should only be worrying about the first.

Instructions in X86 assembly have great flexibility in how they access a memory address. Instructions that read or write memory have the address calculated according to the following formula:

segment + base + index * size + offset

The segment portion of the address is almost always the default DS segment and can usually be ignored. The base portion is given by one of the general purpose registers or the stack pointer. The index part is given by one of the general purpose registers and the size is either 1, 2, 4, or 8. Finally the offset is a constant value embedded in the instruction. Each of these components is optional, but obviously at least one must be given.

This addressing capability is what is generally meant when talking about computing addresses without explicit arithmetic instructions. There is a special instruction that one of the commenters mentioned: LEA which does the address calculation but instead of reading or writing memory, stores the computed address in a register.

For the code you included in the question, it is quite plausible that the compiler would use these addressing modes to avoid explicit arithmetic instructions.

As an example, the current value of the instruction variable could be held in the ESI register. Additionally, each of sizeof(parameter1) and sizeof(parameter2) are compile time constants. In the standard X86 calling conventions function arguments are pushed in reverse order (so the first argument is at the top of the stack) so the assembly codes might look something like

case1: 
  PUSH [ESI+1]
  CALL fn1
  ADD ESP,4 ; drop arguments from stack
  ADD ESI,5
  JMP end_switch
case2: 
  PUSH [ESI+5]
  PUSH [ESI+1]
  CALL fn2
  ADD ESP,8 ; drop arguments from stack
  ADD ESI,9
  JMP end_swtich
case3:
  MOV ESI,[ESI+1]
  JMP end_switch
end_switch:

this is assuming that the size of both parameters is 4 bytes. Of course the actual code is up to the compiler and it is reasonable to expect that the compiler will output fairly efficient code as long as you ask for some level optimization.

like image 131
Geoff Reedy Avatar answered Oct 18 '22 01:10

Geoff Reedy


You have a data item X in the VM, at relative address A, and an instruction that says (for instance) push X, is that right? And you want to be able to execute this instruction without having to add A to the base address of the VM's data area.

I have written a VM that solves this problem by mapping the VM's data area to a fixed Virtual Address. The compiler knows this Virtual Address, and so can adjust A at compile time. Would this solution work for you? Can you change the compiler yourself?

My VM runs on a smart card, and I have complete control over the OS, so it's a very different environment from yours. But Windows does have some facilities for allocating memory at a fixed address -- the VirtualAlloc function, for instance. You might like to try this out. If you do try it out, you might find that Windows is allocating regions that clash with your fixed-address data area, so you will probably have to load by hand any DLLs that you use, after you have allocated the VM's data area.

But there will probably be unforeseen problems to overcome, and it might not be worth the trouble.

like image 41
TonyK Avatar answered Oct 18 '22 02:10

TonyK