I'd like to be able to write raw machine code, without assembly or any other sort of higher level language, that can be put directly onto a flash drive and run. I already know that for this to work, I need to format master boot record headers (which I have managed to do manually) onto the drive. I have completed this, and successfully been able to get a line of text to display on the screen using assembly code in the first sector (in this case, the first 512 bytes) of the drive my code is on. However, I would like to be able to write raw hex code onto the drive, like I did for MBR formatting, without any sort of tool like assembly to help me. I know that there is a way to do this, but I haven't really been able to find anything that doesn't mention assembly. Where can I find information about this? Googling machine code or x86 programming comes up with assembly, which isn't what I want.
Just to paint the picture...
First off you are not going to find a how to program in machine code, that doesn't have assembly associated with it and that should be obvious. Any decent instruction reference of which most you will find contain the assembly for some assembler along with the machine code, because you really need some way to reference some bit pattern and assembly language is that language.
So look up nop for example you find the bit patter 10010000 or 0x90. So if I want to add the instruction nop to my program I add the byte 0x90. So even if you go back to very early processors you still desired to program in assembly language and hand assemble with pencil and paper then use dip switches to clock the program into memory before trying to run it. Because it just makes sense. Decades later even to demonstrate machine code programming, particularly with a painful instruction set like x86, you start with assembly, assemble, then dissassemble, then talk about it, so here goes:
top:
mov ah,01h
jmp one
nop
nop
one:
add ah,01h
jmp two
two:
mov bx,1234h
nop
jmp three
jmp three
jmp three
three:
nop
jmp top
nasm -f aout so.s -o so.elf
objdump -D so.elf
00000000 <top>:
0: b4 01 mov $0x1,%ah
2: eb 02 jmp 6 <one>
4: 90 nop
5: 90 nop
00000006 <one>:
6: 80 c4 01 add $0x1,%ah
9: eb 00 jmp b <two>
0000000b <two>:
b: 66 bb 34 12 mov $0x1234,%bx
f: 90 nop
10: eb 04 jmp 16 <three>
12: eb 02 jmp 16 <three>
14: eb 00 jmp 16 <three>
00000016 <three>:
16: 90 nop
17: eb e7 jmp 0 <top>
so just the first couple of instructions describe the problem and why asm makes so much sense...
The first one you can easily program in machine code b4 01 mov ah,01h we go into the overloaded instruction mov in the documentation and find immediate operand to register. 1011wreg data we have one byte so it is not a word so the word bit is not set, we have to look up reg to find ah end up with b4 and the immediate is 01h. Not that bad, but now jump I want to jump over some stuff, well how much stuff? Which jump do I want to use? Do I want to be conservative and use the fewest byte one?
I can see that I want to jump over two instructions we can easily look up the nops to know they are one byte, 0x90, instructions. so intra-segment direct short should work as the assembler chose. 0xEB but what is the offset? 0x02 to jump over the two BYTES of instructions between where I am and where I want to go.
So you can go through the rest of the instructions I have assembled here from the intel documentation to see what and why the assembler chose those bytes.
Now I am looking at the intel 8086/8088 manual right now, the intra-segment direct short instruction comments on sign extended, the intra-segment direct does not say sign extended, although the processor at this time was 16 bits but you had a few more bits of segment so by only reading the manual, having no access to the design engineers, and using no debugged assembler for reference, how would I know if I could have used the 16 bit direct jump for that last instruction that is branching backward? In this case the assembler chose the byte sized offset, but what if...
Im using a 16 bit manual but 32/64 bit tools, so I have to consider that, but I could and did do this:
three:
nop
db 0xe9,0xe7,0xff,0xff,0xff
instead of jmp top.
00000016 <three>:
16: 90 nop
17: e9 e7 ff ff ff jmp 3 <top+0x3>
for 8086 that would have been 0xe9,0xe7,0xff
db 0xb4,0x01
db 0xeb,0x02
db 0x90
db 0x90
so now what if I wanted to change one of the nops being jumped over to a mov
db 0xb4,0x01
db 0xeb,0x02
db 0xb4,0x11
db 0x90
but its broken now I have to fix the jump
db 0xb4,0x01
db 0xeb,0x03
db 0xb4,0x11
db 0x90
Now change that to an add
db 0xb4,0x01
db 0xeb,0x03
db 0x80,0xc4,0x01
db 0x90
Now I have to change the jump again
db 0xb4,0x01
db 0xeb,0x04
db 0x80,0xc4,0x01
db 0x90
But had I programmed that jmp one in assembly language I don't have to deal with that the assembler does it. It gets worse when your jump is right on that cusp of the distance then you say have some other jumps within that loop, you have to go through the code several times to see if any of those other jumps are 2 or 3 or 4 bytes, and does that push my longer jumps over the edge from one byte to another
a:
...
jmp x
...
jmp a
...
x:
as we pass jump x do we allocate 2 bytes for it? then get to jmp a, allocate two bytes for it as well and at that point we may have figured out all the rest of the instructions between jmp a and a: and it just fits in a two byte jump. but then eventually we get to x: to find that jmp x needs to be 3 bytes, that pushes the jmp a too far now it has to be a three byte jmp, which means we have to go back to jmp x and adjust for the additional byte from jmp a being three bytes now instead of the assumed 2.
The assembler does all off this for you, if you want to program machine code directly first and formost how are you going to keep track of the hundreds of different instructions without some natural language notes to keep track?
So I can do this
mov ah,01h
top:
add ah,01h
nop
nop
jmp top
then
nasm so.s -o so
hexdump -C so
00000000 b4 01 80 c4 01 90 90 eb f9
|.........|
00000009
Or I can do this:
#include <stdio.h>
unsigned char data[]={0xb4,0x01,0x80,0xc4,0x01,0x90,0x90,0xeb,0xf9};
int main ( void )
{
FILE *fp;
fp=fopen("out.bin","wb");
if(fp==NULL) return(1);
fwrite(data,1,sizeof(data),fp);
fclose(fp);
}
I want to add a nop to the loop:
mov ah,01h
top:
add ah,01h
nop
nop
nop
jmp top
vs
#include <stdio.h>
unsigned char data[]={0xb4,0x01,0x80,0xc4,0x01,0x90,0x90,0x90,0xeb,0xf8};
int main ( void )
{
FILE *fp;
fp=fopen("out.bin","wb");
if(fp==NULL) return(1);
fwrite(data,1,sizeof(data),fp);
fclose(fp);
}
If I was really trying to write in machine code I would have to do something like this:
unsigned char data[]={
0xb4,0x01, //top:
0x80,0xc4,0x01, //add ah,01h
0x90, //nop
0x90, //nop
0x90, //nop
0xeb,0xf8 //jmp top
};
To remain sane. There are some instruction sets I have used and made for myself for fun and were easier to program in machine code, but still better done with comments in pseudocode using assembly mnemonics...
If your goal is to simply end up with some blob of machine code in some format, bare metal or other not some Windows or Linux file format program, you use assembly language and in one or two steps of the toolchain you get from the assembly source to the binary machine code result. Worst case you write an ad hoc program to get from the output of the toolchain, and manipulate those bits into other bits. You don't toss out the tools available to write raw bits at the end by hand, you just reformat the output file format.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With