Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

whats the use of "org xxxx" in assembly

Recently im learning how to write a boot sector, here is the complete code that i am learning:

org 07c00h
    mov ax, cs
    mov ds, ax
    mov es, ax
    call DispStr
    jmp $

DispStr:
    mov ax, BootMessage
    mov bp, ax
    mov cx, 16
    mov ax, 01301h
    mov bx, 000ch
    mov dl, 0
    int 10h
    ret

BootMessage: db "Hello, OS!"
times 510-($-$$) db 0

dw 0xaa55

a very simple code if you know how to boot a system. the result is a line Hello OS! displayed on the screen, the only thing that i dont know is the first line: org 07c00h.

The book tells me that the line of code let the compiler to locate the address to the 7c00h place, but the explanation is very ambiguous, and I really don't know whats the use of it here. what in the world does the line org 07c00h do here?

I tried to remove the line, and use nasm to create a bin file, then use the bochs to boot the bin file. Nothing different from the previous one: "hello OS!" displayed on the screen too.

Can i say that the first line does nothing here? What's the use of org xxxx?

like image 701
Searene Avatar asked Jan 17 '23 20:01

Searene


1 Answers

The assembler is translating each line of your source code to processor instruction and generates these instructions in sequence, one after another, into the output binary file. Doing that, it maintains an internal counter which counts the current address of any such instruction, starting from 0 and upwards.

If you're assembling a normal program, these instructions will end up in the code section at some object file with just blank slots for addresses, which have to be filled in with proper addresses by the linker afterwards, so it's not a problem.

But when you assemble a flat binary file without any sections, relocations and other formatting, just raw machine instructions, then there is no information for the assembler about where are your labels indicating to and what are the addresses of your code & data. So, for example, when you have an instruction mov si, someLabel, then the assembler can only calculate the offset of this label starting from 0 at the beginning of the binary file. (i.e. the default is ORG 0 if you don't specify one.)

If it's not true, and you want your machine instructions+data in memory to begin from some other address, e.g. 7C00, then you need to tell the assembler that the starting address of your program is 7C00 by writing org 0x7C00 at the beginning of your source. This directive tells the assembler that it should start up its internal address counter from 7C00 instead of from 0. The result is that all addresses used in such a program will be shifted by 7C00. The assembler simply adds 7C00 to each of the address calculated for each label. The effect is as if the label was located in memory at the addres, say, 7C48 (7C00 + 48) instead of just 0048 (0000 + 48), no matter that it is offset only 48 bytes from the beginning of the binary image file (which, after loading at the offset 7C00 will give the proper address).

These "addresses", if used directly like jmp si or mov al, [si], are the offset part of seg:off logical addressing, where in real mode the segment part is left-shifted by 4 to get a base that the offset adds to. (So 07C0:000 and 0000:7C00 address the same linear address, 7C00.) The segment part comes from whatever you've put into the relevant segment register, or whatever the BIOS left there if you didn't set it to a fixed value.

If your cs, ds, and/or es segment registers are set to match where in linear address space your MBR is loaded (always 7C00), so the first byte of your file is at es:0 for example, using that offset with a correctly-set segment base will actually reach your data. jmp si will jump to that label if cs is set so cs:si is where your code is. i.e. if cs:org references the first byte of your MBR. mov ax, [si] will load 2 bytes from it if ds is set correctly.

In your case, int 10h/ah=13h uses es:bp, and there are no other uses of absolute addressing, only relative jumps/calls whose encoding doesn't depend on org. You set es from cs at the start of the bootloader for some reason, instead of setting it to a fixed value to match the org you're using. This is a bug; your bootloader won't work on BIOSes that jump to the MBR with CS:IP = 07C0:0000, only ones that use 0000:7C00 matching your org. Fix this by replacing mov ax,cs with xor ax,ax; it doesn't matter whether DS/ES are different from CS or not, just that ES: BootMessage-$$ + org is where your data actually is.


Linear vs. Logical addresses

As to your other question: 7C00 is the linear physical address of the bootloader. You can represent this physical address as a logical address (segment:offset) in different ways, because segments overlap (next segment starts 16 bytes (10 in hex) after the previous one). For example, you can use logical address 0000:7C00 which is the simplest configuration: you use segment 0 starting at the beginning of your RAM, and offset 7C00 from that 0. Or, you can use logical address 07C0:0000, which is 7C0th segment. Remember that segments start 16 bytes apart from each other? So you simply multiply this 7C0 by 10 (16 in decimal) and you get 7C00 -- see? It's a matter of shift one position to the right in your hexadecimal address! :-) Now you just add your offset, which is 0 this time, so it's still 7C00 physically. The byte 0 in segment 07C0 which starts at 7C00 in memory.

Of course you can also use more complicated addresses, like, for example, 0234:58C0, which means that the segment starts at 2340 and when you add 58C0 offset to it, you'll get 7C00 again :-) But doing that could be confusing. It all depends on what configuration you need. If you want to consider the 7C00 physical address as the start of your segment, just use segment 07C0 and your first instruction will be at offset 0, so you don't need to put org directive, or you can put org 0 then. But if you need to read/write some data below the 7C00 address (for example, peek the BIOS data or fiddle with interrupt vectors), then use segment 0 and offset 7C00 which means your first instruction (0th byte in your binary file) will be located at 7C00 physical address in memory; then you have to add org 0x7C00 directive from the reasons described above.


The BIOS will jump to your code with CS:IP = 07C0:0000 or 0000:7C00. And with unknown values in DS/ES/SS:SP. You should write your bootloader to work either way, using xor ax,ax / mov ds,ax to set DS base to zero if you're using org 0x7c00.

See Michael Petch's general tips for bootloader development for more about writing robust bootloaders that avoid making assumptions about the state the BIOS left, except for ones that all BIOSes must get right to work at all with mainstream software. (e.g. loading your 512-byte MBR at linear address 0x00007c00 and drive number in DL).

Almost(?) all BIOSes start an MBR with either CS=0 or CS=07C0, not some other seg:off way of reaching the same linear address. But you definitely shouldn't assume one or the other.

like image 193
4 revs, 3 users 58% Avatar answered Feb 06 '23 08:02

4 revs, 3 users 58%