Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Emulating an x86 processor -- how should I start?

Tags:

x86

emulation

Recently I learned that Fabrice Bellard created an implementation of QEMU (more or less, you get the idea) in JavaScript by emulating an i586. This caught my interest as I have always been fascinated by the complexity that fits inside a microprocessor -- hence wish to achieve the same task of implementing an x86 emulator. However, it probably won't be nearly as sophisticated, obviously; that's not the main point.

I do have experience with operating system and low-level programming, in addition some really simple CHIP8 emulator programing. I do know how simple emulators work. How can I take advantage of my current knowledge within this area when I go about developing an x86 processor -- or is it too different?

Also I would be grateful of any input by those of you who have already accomplished what I am eagerly attempting to do. Are there any books I should keep at my desk? (Note that I love books. I would highly appreciate recommendations.) Papers? Web sites I should know of? ... and so forth.

Thanks in advance.

like image 712
rotalume Avatar asked May 18 '11 07:05

rotalume


1 Answers

first off, what is your real goal? Are you interested in an accurate, educational type simulator or are you trying for a vmware, qemu/kvm type speed thing, in the latter case taking advantage of executing a percentage of the instructions on the host processor itself (not simulating). Even if not wanting to execute on the host, but if you are interested in performance (with a possible sacrifice of accuracy or debugability or fault checking) look at the mame sources, there is a long list of processor simulators that are written for speed.

The one I mentioned above was written more for educational purposes, mine in particular and anyone else that may find it interesting. For that type of model I recommend a few key points. Abstract your memory read, write, and fetch operations (have a read_mem_8() function and write_mem_8(), etc which like hardware does address decoding, etc). Likewise abstract the register read/write operations into functions. The simulator centers around an execute a single instruction function, which is called in a loop for a fixed number of instructions or an infinite loop, or somewhere in the middle, your choice. That way you can manage interrupt mode from foreground or other modes outside the function that manages the instruction decoding. The instruction decoder is not unlike a disassembler, it is simpler in the sense that for variable length instruction sets (like the x86) you do not have to figure a way to search through the bytes looking for the start of an instruction, by executing, you assume that the binary is real and the code executes, naturally though you need to have an undefined instruction handler of some sort.

x86 would not be my first choice for a long list of reasons. Here again what are your goals? The x86 is going to have 32/64 bit modes, going to have memory protection schemes, a number of execution modes, etc. I would (and have many times) start with a simpler instruction set for the first few times, msp430, pic (older pic not dspic nor pic32), 6502, etc. There are a number of 6502 roms in the mame world that you can play with. (note that some of the 6502 simulators out there have bugs). The msp430 and pic have few instructions and are an afternoon project once you get the swing of things. An arm might be a stepping stone to x86 if you still feel you really need to do an x86. Various execution modes, you can simulate the known mmu and fpu as desired and boot linux, windows, etc.

re-reading your question I may have oversimplified my answer, sounds like you have some experience. x86 is no different than any other processor in that you need to compile some simple binaries that count and loop for example, and attack that binary, decode and execute, increase the complexity of your test programs, adding more instructions supported to your simulator, at some point that gets boring and it is time for the long haul typing session, going through and implementing all of the instructions (without necessarily testing each one). Then go back and try to execute more complicated binaries (to try to test all of the new instructions). I tend to use self-checking tests like compressing some data and then decompressing using an open source package (compiled for embedded) and comparing the input and output. Encryption routines are good as well, aes, des, etc. md5, sha, which are not self checking but you can pre-compute the answer on the host platform and hardcode the answer in the test. open source jpeg, png, etc. mp3 decoder. there are fixed point jpeg and mp3 decoders or you can go with a soft fpu. Different compilers produce different instruction mixes, some compilers dont use certain instructions or instruction sequences at all so I highly recommend taking these test programs and re-compiling and running them both with several different optimization settings and with as many compilers as you can get for that processor. Different high level languages should also produce different instruction mixes. You may find that a single individual programmer using a single language and a single compiler is only going to give you limited coverage, the individual has specific programming habits and styles that limit the variety in the output, same goes for assembler an individual is going to have a limited instruction mix they generate.

like image 73
old_timer Avatar answered Nov 10 '22 06:11

old_timer