Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Placing an instruction in the address pointed by the reset vector using times and align NASM directives

I've been thinking for a while about the following assembly code (NASM IA-32):

ORG 0xFF000 ; This is (1MB - 4KB) 0x100000 - 0x1000=0xFF000.
USE16 ;produce 16bit code
code_size EQU (end -init_16) ; calculates code length

times (4096-code_size) db 0x90 ; fills the rest of the memory with NOP's
init_16:
    cli ;disables interrupts (not really necessary, just an example)
    jmp init_16 ;infinite loop
align 16
end :

It's just an example. The idea is that we have an IA-32 processor in real mode. And on the top 4Kbyte of the memory we have an NVRAM (non volatile RAM). The reset vector points to 0xFFF0, so the code tries to place the cli instruction in the 0xFFFF0 address independently of the amount of instructions placed between the init16 label and the align 16 directive (limited to 16 bytes so it can fit in to the 1Mbyte). But I can't understand how it does it.

I'm particularly troubled with the align 16 and times directives. Because they seem to depend on the result of the other so I don't know how NASM solves this.

First, we have the times directive that needs the result of the align 16 directive. times needs to know how many bytes did align 16 add in order to change the code_size label and fill the rest of the memory with NOP's.

We also have the align directive that needs to know what was the result of the times directive in order to know where did the jmp instruction ended up and then calculate how many NOP's it has to add to get to the new 16bit aligned position.

So it seems to me that both directives depend on the result of the other.

Furthermore, I can't figure why the cli instruction always ends up in the 0xFFFF0 addres independently if add instructions between the cli and jump. It is the objective, but I don't know how it works.

I think that both directives make an undetermined system so there are many different solutions. For example in the code I presented before I think a solution could be:

The cli instruction ends up in 0xFFFF1 the jump instruction in 0xFFFF2 and the align 16 fills the addresses 0xFFFF2 to 0xFFFFF with NOP's So the code size label is now defined and the times directive fills the addresses 0x0000 to 0xFFFF0 with NOP's

Why this is not the behavior of the code?

like image 663
Gaston Avatar asked Jan 27 '21 22:01

Gaston


1 Answers

Firstly, I find it strange to see ORG 0xFF000 together with USE16. It think in real address mode, ORG is meant to be a 16-bit offset in a 64KB segment.

It's the wonder of multi-pass assemblers

Because on the first pass, the assembler does not yet know about the end and init_16 labels, it could just skip the times that depends on it. This would leave the current offset ($) at ORG. Then come the 3 bytes from encoding cli and the short jump jmp init_16, followed by the 13 bytes produced by align 16.
At this point, both labels are known and a following pass can start using these offsets. code_size is calculated to be 16 (the difference between both labels) and so times fills with 4080 nops (4096-16).
Although the 2 labels have now moved up in memory by 4080 bytes, their difference is still the same (16) and so no further passes are needed. The code is resolved.

Furthermore, I can't figure why the cli instruction always ends up in the 0xFFFF0 addres independently if add instructions between the cli and jump. It is the objective, but I don't know how it works

Adding a few instructions right after this cli does not change the procedure that was outlined for as long as the difference between both labels stays 16. You could insert instructions worth 13 bytes.

like image 148
Sep Roland Avatar answered Oct 20 '22 11:10

Sep Roland