Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Importance of page count and last page size in an MZ (DOS, 16 bit) .EXE header

I am trying to learn how to create Dos .EXE files using assembly (NASM), constructing the header by hand and assembling the file as binary. I have a problem with the page options (both the total number of pages and the byte count at the final page). No matter how small I set the initial values, the program will work.

As an extreme case, the following program functions even when setting 1 page of 1 byte:

;
; the smallest possible "Hello, World!" .EXE (DOS MZ) file
; assemble with:
; nasm -f bin -w+all -O0 smallest_hello_exe.asm -o ASM.EXE
;

bits 16
cpu 8086

;
; by setting cs:ip=-10h:100h instead of 0h:0h inside the .EXE header
; (identical assignments), we achieve the following two advantages:
; 1) ds==cs, so no "push cs pop ds" is needed in order for ds:dx
; to point to the message string
; 2) we can exit by int 20h instead of int 21h, thus omitting the
; ah=4ch assignment
; (int 20h requires that cs points to the PSP segment)
;

;
; we do not the address calculations to take the .EXE header into account
; so we must subtract its length (20h) by an "org -20h"
; but, since ip will be 100h, we must also issue an "org 100h"
; and, since 0x100-0x20=0xE0...

org 0xE0        ; 100h for ip value - 20h for header



section .text align=1
;
; the MZ .EXE header structure
; 28 bytes long
; 1 pararaph equals 16 bytes
; 1 page equals 512 bytes
; suggested reading: int 21h,ah=4bh procedure
;
host_exe_header:
.signature: dw 'MZ'     ; the 'MZ' characters
.last_page_size: dw 1   ; number of used bytes in the final file page, 0 for all
.page_count: dw 1       ; number of file pages including any last partial page
.reloc: dw 0            ; number of relocation entries after the header
.paragraphs: dw 2       ; size of header + relocation table, in paragraphs
.minalloc: dw 0         ; minimum required additional memory, in paragraphs
.maxalloc: dw 0xFFFF    ; maximum memory to be allocated, in paragraphs
.in_ss: dw 0            ; initial relative value of the stack segment
.in_sp: dw 0xF000       ; initial sp value
.checksum: dw 0         ; checksum: 1's complement of sum of all words
.in_ip: dw 100h         ; initial ip value
.in_cs: dw -10h         ; initial relative value of the text segment
.offset: dw 0           ; offset of the relocation table from start of header
.overlay: dw 0          ; overlay value (0h = main program)

; pad header (its size in bytes must be a multiple of 16)
times (32-$+$$) db 0

mov dx,message
mov ah,09h              ; write string ds:dx to stdout
int 21h
int 20h

section .data align=1
message: db 'Hello, World!$'

section .bss align=1

Experimenting with different program sizes, I have come to the conclusion that Dos loads all 512 bytes of each page into memory. If so, what is the purpose of the number of bytes in the last page?

Can it interfere with .bss, stack data, and/or dynamic memory allocations?

like image 811
padawan Avatar asked Jan 09 '13 21:01

padawan


1 Answers

The total page count is definitely not ignored, it is even used by programs that don't want all of their file to be loaded initially. They will read the necessary fragments later themselves. The bytes in the last page field may or may not be ignored, depending on OS version. It could also be rounded up to a paragraph or disk sector boundary. You shouldn't depend on a particular behavior and fill it in properly.

Your test code works because it's small and your particular OS has chosen to load enough of it into memory. If you make your program larger than a single page but still specify 1 in the page count field, probably your code will not be fully loaded and won't work. I tried:

times (32-$+$$) db 0
times (512) nop
mov dx,message
mov ah,09h              ; write string ds:dx to stdout
int 21h
int 20h

This fails if page count is 1, but works if page count is 2 (used dosbox for testing).

like image 157
Jester Avatar answered Dec 31 '22 20:12

Jester