VM Design: More opcodes or less opcodes? What is better?

Tags:

Don't be shocked. This is a lot of text but I'm afraid without giving some detailed information I cannot really show what this is all about (and might get a lot of answers that don't really address my question). And this definitely not an assignment (as someone ridiculously claimed in his comment).

Prerequisites

Since this question can probably not be answered at all unless at least some prerequisites are set, here are the prerequisites:

The Virtual Machine code shall be interpreted. It is not forbidden that there may be a JIT compiler, but the design should target an interpreter.
The VM shall be register based, not stack based.
The answer may neither assume that there is a fixed set of registers nor that there is an unlimited number of them, either one may be the case.

Further we need a better definition of "better". There are a couple of properties that must be considered:

The storage space for the VM code on disk. Of course you could always scrap all optimizations here and just compress the code, but this has a negative effect on (2).
Decoding speed. The best way to store the code is useless if it takes too long to transform that into something that can be directly executed.
The storage space in memory. This code must be directly executable either with or without further decoding, but if there is further decoding involved, this encoding is done during execution and each time the instruction is executed (decoding done only once when loading the code counts to item 2).
The execution speed of the code (taking common interpreter techniques into account).
The VM complexity and how hard it is to write an interpreter for it.
The amount of resources the VM needs for itself. (It is not a good design if the code the VM runs is 2 KB in size and executes faster than the wink of an eye, however it needs 150 MB to do this and its start up time is far above the run time of the code it executes)

Now examples what I actually mean by more or less opcodes. It may look like the number of opcodes is actually set, as you need one opcode per operation. However its not that easy.

Mulitple Opcodes for the Same Operation

You can have an operation like

ADD R1, R2, R3

adding the values of R1 and R2, writing the result to R3. Now consider the following special cases:

ADD R1, R2, R2 ADD R1, 1, R1

These are common operations you'll find in a lot of applications. You can express them with the already existing opcode (unless you need a different one because the last one has an int value instead of a register). However, you could also create special opcodes for these:

ADD2 R1, R2 INC R1

Same as before. Where's the advantage? ADD2 only needs two arguments, instead of 3, INC even only needs a single one. So this could be encoded more compact on disk and/or in memory. Since it is also easy to transform either form to the other one, the decoding step could transform between both ways to express these statements. I'm not sure how much either form will influence execution speed, though.

Combining Two Opcodes Into a Single One

Now let's assume you have an ADD_RRR (R for register) and a LOAD to load data into an register.

LOAD value, R2 ADD_RRR R1, R2, R3

You can have these two opcodes and always use constructs like this throughout your code... or you can combine them into a single new opcode, named ADD_RMR (M for memory)

ADD_RMR R1, value, R3

Data Types vs Opcodes

Assume you have 16 Bit integer and 32 Bit integer as native types. Registers are 32 Bit so either data type fits. Now when you add two registers, you could make the data type a parameter:

ADD int16, R1, R2, R3 ADD int32, R1, R2, R3

Same is true for a signed and unsigned integers for example. That way ADD can be a short opcode, one byte, and then you have another byte (or maybe just 4 Bit) telling the VM how to interpret the registers (do they hold 16 Bit or 32 Bit values). Or you can scrap type encoding and instead have two opcodes:

ADD16 R1, R2, R3 ADD32 R1, R2, R3

Some may say both are exactly the same - just interpreting the first way as 16 Bit opcodes would work. Yes, but a very naive interpreter might look quite different. E.g. if it has one function per opcode and dispatches using a switch statement (not the best way doing it, function calling overhead, switch statement maybe not optimal either, I know), the two opcodes could look like this:

case ADD16: add16(p1, p2, p3); break; // pX pointer to register case ADD32: add32(p1, p2, p3); break;

and each function is centered around a certain kind of add. The second one though may look like this:

case ADD: add(type, p1, p2, p3); break;  // ... // and the function  void add (enum Type type, Register p1, Register p2, Register p3) {     switch (type) {        case INT16: //...        case INT32: // ...     } }

Adding a sub-switch to a main switch or a sub dispatch table to a main dispatch table. Of course an interpreter can do either way regardless if types are explicit or not, but either way will feel more native to developers depending on opcode design.

Meta Opcodes

For lack of a better name I'll call them that way. These opcodes have no meaning at all on their own, they just change the meaning of the opcode following. Like the famous WIDE operator:

ADD R1, R2, R3 WIDE ADD R1, R2, R3

E.g. in the second case the registers are 16 Bit (so you can addnress more of them), in the first one only 8. Alternatively you can not have such a meta opcode and have an ADD and an ADD_WIDE opcode. Meta opcodes like WIDE avoid having a SUB_WIDE, MUL_WIDE, etc. as you can always prepend every other normal opcode with WIDE (always just one opcode). Disadvantage is that an opcode alone becomes meaningless, you always must check the opcode before it if it was a meta opcode or not. Further the VM must store an extra state per thread (e.g. whether we are now in wide mode or not) and remove the state again after the next instruction. Even CPUs have such opcodes (e.g. x86 LOCK opcode).

How to Find a Good Trade-Off???

Of course the more opcodes you have, the bigger switches/dispatch-tables will become and the more bits you will need to express these codes on disk or in memory (though you can maybe store them more efficiently on disk where the data doesn't have to be directly executable by a VM); also the VM will become more complicated and have more lines of code - on the other hand the more powerful the opcodes are: You are getting closer to the point where every expression, even a complex one, will end up in one opcode.

Choosing little opcodes makes it easy to code the VM and will lead to very compact opcodes I guess - on the other hand it means you may need a very high number of opcodes to perform a simple task and every not extremely often used expression will have to become a (native) function call of some kind, as no opcode can be used for it.

I read a lot about all kind of VMs on the Internet, but no source was really making a good and fair trade-off going either way. Designing a VM is like designing a CPU, there are CPUs with little opcodes, they are fast, but you also need many of these. And there are CPUs with many opcodes, some are very slow, but you'll need much less of them to express the same piece of code. It looks like the "more opcodes are better" CPUs have totally won the consumer market and the "less opcodes are better" ones can only survive in some parts of the server market or super computer business. What about VMs?

232

asked Jun 09 '09 20:06

Mecki

2 Answers

To be honest, I think it's largely a matter of the purpose of the VM, similar to how the processor design is largely determined by how the processor is primarily meant to be used.

In other words, you'll preferably be able to determine common use case scenarios for your VM, so that you can establish features that are likely going to be required, and also establish those that are unlikely to be very commonly required.

Of course I do understand, that you are probably envisioning an abstract, very generic, Virtual Machine, that can be used as the internal/backend implementation for other programming languages?

However, I feel, it's important to realize and to emphasize that there really is no such thing as a "generic ideal" implementation of anything, i.e. once you keep things generic and abstract you will inevitably face a situation where you need to make compromises.

Ideally, these compromises will be based on real life use scenarios for your code, so that these compromises are actually based on well-informed assumptions and simplifications that you can make without going out on a limb.

In other words, I would think about what are the goals for your VM? How is it primarily going to be used in your vision? What are the goals you want to achieve?

This will help you come up with requirements and help you make simplifcations, so that you can design your instruction set based on reasonable assumptions.

If you expect your VM to be primarily used by programming languages for numbers crunching, you'll probably want to look for a fairly powerful foundation with maths operations, by providing lots of low level primitives, with support for wide data types.

If on the other hand, you'll server as the backend for OO languages, you will want to look into optimizing the corresponding low level instructions (i.e. hashes/dictionaries).

In general, I would recommend to keep the instruction set as simple and intuitive as possible in the beginning, and only add special instructions once you have proven that having them in place is indeed useful (i.e. profile & opcode dumps) and does cause a performance gain. So, this will be largely determine by the very first "customers" your VM will have.

If you are really eager to research more involved approaches, you could even look into dynamically optimizing the instruction set at runtime, using pattern matching to find common occurrences of opcodes in your bytecode, in order to derive more abstract implementations, so that your can transform your bytecode dynamically with custom, runtime-generated, opcodes.

answered Sep 21 '22 14:09

none

For software performance it's easier if all opcodes are the same length, so you can have one gigantic switch statement and not have to examine various option bits that might have been set by preceding modifier opcodes.

Two matters that I think you didn't ask about are ease of writing compilers that translate programming languages to your VM code and ease of writing interpreters that execute your VM code. Both of these are easier with fewer opcodes. (But not too few. For example if you omit a divide opcode then you get an opportunity to learn how to code good division functions. Good ones are far harder than simple ones.)