I'd like to make a simple x86 assembler. I'm wondering if there's any tutorials for making your own assembler. Or if there's a simple assembler that I could study.
Also, I wonder what tools are used in looking at and handling the binary/hex of programs.
Assembler is a program for converting instructions written in low-level assembly code into relocatable machine code and generating along information for the loader. It generates instructions by evaluating the mnemonics (symbols) in operation field and find the value of symbol and literals to produce machine code.
It's just tedious to write Assembler using Assembly Language or even Machine Code. The purpose of this task is to understand how to translate Assembly Language to Machine Code, it's fine to use higher level language such as C, python, java etc as long as you understand the theory behind it.
There's nothing special about how an assembler is written. All it does is parse an assembly syntax and spit out machine code for a particular architecture. If your preferred programming language can read text and write binary, you can create an assembler with it.
This is what you are looking for:
Assemblers And Loaders - By David Salomon. Published February, 1993 - Freely available (download here)
Of course, you are going to need the following:
You can always refer to implementations of Opensource Assemblers:
Just a very tiny piece of code in Delphi 7.
{$APPTYPE CONSOLE} program assembler; uses sysutils; const s1=#0#77#1#90#59#64#4#80#1#69#3#76#1#1#1#1#14#224#2#15#1#1#1#11#1#1#1#1#1#64#13+ #116#1#16#13#64#3#16#4#2#3#1#8#3#2#10#7#32#4#2#7#3#5#16#4#16#5#1#10#16#13#16#3+ #184#124#184#5#16#3#184#5#2#15#96#3#224#173#52#1#16#3#40#1#16#23#65#1#16#3#80#1+ #16#7#75#1#69#1#82#1#78#1#69#1#76#1#51#1#50#1#46#1#68#1#76#1#76#4#71#1#101#1+ #116#1#83#1#116#1#100#1#72#1#97#1#110#1#100#1#108#1#101#4#87#1#114#1#105#1#116+ #1#101#1#67#1#111#1#110#1#115#1#111#1#108#1#101#1#65#2#72#1#101#1#108#1#108#1+ #111#1#44#1#32#1#87#1#111#1#114#1#108#1#100#1#33#1#13#1#10#5#0; s3=#1#185#1#7#4#136#1#195#1#128#1#227#1#15#1#193#1#216#1#4#1#128#1#251#1#9+ #1#118#1#3#1#128#1#195#1#39#1#128#1#195#1#48#1#136#1#153#1#96#1#16#1#64#2#73#1+ #125#1#228#1#106#2#104#1#112#1#16#1#64#2#106#1#8#1#104#1#96#1#16#1#64#2#106#1+ #245#1#255#1#21#1#40#1#16#1#64#2#80#1#255#1#21#1#44#1#16#1#64#2#195; var f:file of byte;p,i:integer;o:string; t:text;line:string; procedure w(s: string); begin i:=1; while i<length(s) do begin inc(p,ord(s[i])); setlength(o, p); o[p]:=s[i+1]; inc(i,2); end; end; procedure al(b: byte); var a: longword;pc: pchar; begin a := strtoint(line); pc:=@a; o := o + chr(b) + pc^ + (pc+1)^ + (pc+2)^ + (pc+3)^; inc(p,5); // mov eax, imm32 end; begin assign(f,'out.exe'); rewrite(f); p:=1; w(s1); assignfile(t, ''); reset(t); while not eof(t) do begin readln(t, line); line := trim(line); if copy(line,1,8) = 'mov eax,' then begin system.delete(line,1,8); al($b8); // mov eax, imm32 end else if copy(line,1,8) = 'add eax,' then begin system.delete(line,1,8); al($05); // add eax, imm32 end else if copy(line,1,8) = 'and eax,' then begin system.delete(line,1,8); al($25); // and eax, imm32 end end; closefile(t); w(s3); blockwrite(f,o[1],p); close(f); end.
The assembler understands only three different assembler codes "mov eax,immed32", "add eax,immed32", "and eax,immed32" and no data nor labels. It will produce a tiny Windows PE executable which outputs eax in hex at the end.
Attention: In my case avira free antivirus doesn't like the output. It's a false positive. I had to switch off the real time protection. Check the result with a debugger if you are uncertain if this is malware (It's not!)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With