Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does mterp (Dalvik VM) organize its byte-code interprete loop?

I am studying Android Dalvik VM and encounter a question when I read the mterp code in file vm/mterp/out/InterpC-portable.cpp. Actually it's the main interpreter loop of dalvik vm to interprete the byte code in dex file. If I wrote this file, I will choose a switch-case structure to do like this:

while (hasMoreIns()) {
   int ins = getNextIns();
   switch(ins) {
     case MOV:
       //interprete this instruction
       ...
       break;
     case ADD:
       ...
       break;
     ...
     default: break;
   }
 }

However, what mterp uses is very different with my thoughts, it uses some magical code(for me) like this:

FINISH(0);

HANDLE_OPCODE(OP_NOP)
   FINISH(1);
OP_END

HANDLE_OPCODE(OP_MOVE)
   ...
OP_END

...

I google it and find it seems to be a modified "threaded" style execution, which different with switch-case style and has a better performance because it remove the branch operation in while loop. But I still can't understand this code and why it's better on performance. How does it find the next code to interpreter?

like image 698
Jin Chen Avatar asked Nov 13 '12 13:11

Jin Chen


1 Answers

As a brief bit of guidance, the out directory is filled with preprocessed files and is not what I'd call a great thing to read, if you're trying to figure out the code. The source (per se) that corresponds to InterpC-portable.cpp is the contents of the portable and c directories.

In terms of how the code does opcode dispatch, you'll want to look at the definition of the FINISH macro, in portable/stubdefs.cpp:

# define FINISH(_offset) {                                                  \
        ADJUST_PC(_offset);                                                 \
        inst = FETCH(0);                                                    \
        if (self->interpBreak.ctl.subMode) {                                \
            dvmCheckBefore(pc, fp, self);                                   \
        }                                                                   \
        goto *handlerTable[INST_INST(inst)];                                \
    }

This macro is used at the end of each opcode definition and serves as the equivalent of a switch (opcode) statement. Briefly, this reads the instruction code unit pointed at by the PC — inst = FETCH(0) — grabs the opcode out of it — INST_INST(inst) — and uses that opcode as an index into the table of addresses of all the opcodes. The address is directly branched to with the goto statement.

The goto is a "computed goto," which is a non-standard GCC extension. You can read about it in the GCC manual, and you can also find a bit about the topic in the presentation I gave on Dalvik internals at Google IO back in 2008. (Find it at https://sites.google.com/site/io/dalvik-vm-internals.)

My talk also touches on the topic of the performance characteristics of this technique. Briefly, it saves some amount of branching and plays relatively nice with branch prediction. However, there are better ways to write an interpreter (as I cover in the talk, and as the CPU-specific Dalvik interpreters in fact work).

And for just a bit more of the larger context, compiling bytecode to native CPU instructions is in general going to result in faster execution than even the most well-tuned interpreter, assuming you have sufficient RAM to hold the compiled result. The trace-based Dalvik JIT that was introduced in Froyo was meant to make a tradeoff wherein modest amounts of extra RAM were used to achieve reasonably-fruitful performance gains.

like image 193
danfuzz Avatar answered Nov 11 '22 13:11

danfuzz