Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing a Python compiler for practice [closed]

Recently I've been reading quite a bit about CPUs and architectures; mainly opcodes, Integrated Circuits, etc. I've been a python developer for a few years, and I'd like to get some practice in writing machinecode.

I thought for fun I'd compile a very simple python script into machinecode as a way to practice it. The script is as follows:

a = 2
b = 3
c = a + b
print c

I'm writing the compiler in python, because I'm not as good at C as I am at python. I've looked a round a little, and I have the following python libraries at my disposal, which might help, i.e.

binascii.hexify(hex(2))  <-- should convert 2 to binary, correct?

file = open('/usr/local/bin/my_sample_program','wb') <-- should write the resulting binary file

I still have to find the opcodes for Intel Core i5, but that should be easy to.

My question is as follows:

1) How do I write the opcode to the file? In other words, assume the opcode for setting a register to contain the value 2 is 0010 how do I write this is as the first four numbers in the program's first line of execution?

2) How do I tell the OS, either OS X or Ubuntu, to load the program into physical memory? I'm assuming that the first thing a compiler does is write instructions for the OS onto the resulting binary file?

3) Any resources that you might know of that can help me would be appreciated.

like image 925
Sam Hammamy Avatar asked Jan 13 '13 17:01

Sam Hammamy


1 Answers

That is quite a project you are planning there. In addition to learning how a compiler works, you also need to read up on loadable file formats like ELF, and tons of information on operating-system details.

I would suggest that you would emit an assembly file as output of your compiler. Then you could use an existing assembler to convert the file into machine code. In fact, this is what most C compilers (including GCC) do "under the surface".

EDIT: The output of a compiler or an assembler is typically an object file. This is later combined with other object files by a linker. Writing the entire tool chain, compiler, assembler, linker, and other associated tools would easily multiple man-years. In this light, I don't think that you should see a straight-forward solution like using an existing assembler and linker as cheating.

like image 192
Lindydancer Avatar answered Sep 19 '22 06:09

Lindydancer