Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing a VM - well formed bytecode?

I'm writing a virtual machine in C just for fun. Lame, I know, but luckily I'm on SO so hopefully no one will make fun :)

I wrote a really quick'n'dirty VM that reads lines of (my own) ASM and does stuff. Right now, I only have 3 instructions: add, jmp, end. All is well and it's actually pretty cool being able to feed lines (doing it something like write_line(&prog[1], "jmp", regA, regB, 0); and then running the program:

while (machine.code_pointer <= BOUNDS && DONE != true)
{
    run_line(&prog[machine.cp]);
}

I'm using an opcode lookup table (which may not be efficient but it's elegant) in C and everything seems to be working OK.

My question is more of a "best practices" question but I do think there's a correct answer to it. I'm making the VM able to read binary files (storing bytes in unsigned char[]) and execute bytecode. My question is: is it the VM's job to make sure the bytecode is well formed or is it just the compiler's job to make sure the binary file it spits out is well formed?

I only ask this because what would happen if someone would edit a binary file and screw stuff up (delete arbitrary parts of it, etc). Clearly, the program would be buggy and probably not functional. Is this even the VM's problem? I'm sure that people much smarter than me have figured out solutions to these problems, I'm just curious what they are!

like image 766
David Titarenco Avatar asked May 11 '10 23:05

David Titarenco


People also ask

What is VM bytecode?

Bytecode is computer object code that an interpreter converts into binary machine code so it can be read by a computer's hardware processor. The interpreter is typically implemented as a virtual machine (VM) that translates the bytecode for the target platform.

Does a virtual machine execute bytecode?

Machine code is a set of instructions in machine language or in binary format and it is directly executed by CPU. 04. Byte code is executed by the virtual machine then the Central Processing Unit.

How do I write bytecode in python?

CPython compiles the python source code into the bytecode, and this bytecode is then executed by the CPython virtual machine. All the generated pyc files will be stored in the __pycache__ folder. If you provide no file names after compileall, it will compile all the python source code files in the current folder.

Which programming language is compiled to byte code to run on the virtual machine?

Programs written in Java are compiled into machine language, but it is a machine language for a computer that doesn't really exist. This so-called "virtual" computer is known as the Java Virtual Machine, or JVM. The machine language for the Java Virtual Machine is called Java bytecode.


1 Answers

Is it the VM's job to make sure the bytecode is well formed or is it just the compiler's job to make sure the binary file it spits out is well formed?

You get to decide.

Best practice is to have the VM do a single check before execution, cost proportional to the size of the program, which is sophisticated enought to guarantee that nothing wonky can happen during execution. Then during actual execution of the bytecode, you run with no checks. However, the check-before-running idea can require some very sophisticated analysis, and even the most performance-conscious VMs often have some checks at run time (example: array bounds).

For a hobby project, I'd keep things simple and have the VM check sanity every time you execute an instruction. The overhead for most instructions won't be too great.

like image 106
Norman Ramsey Avatar answered Oct 26 '22 12:10

Norman Ramsey