Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get an extra segment in DOS?

I'd like to write a little DOS program (my first one) and I'm a little bit inexperienced.

For the program, I need more than 64 kilobytes of (conventional) memory. How can I get extra memory? Ideally, I'd like to have two extra 64k blocks of memory for the program. Can I just start to write data somewhere into the address space or do I need to request extra memory?

like image 417
fuz Avatar asked May 29 '16 11:05

fuz


2 Answers

Under DOS, yes, you can just start using another segment of memory. There is an important caution, however!

Have a look at a memory map for the version of DOS that you are using. You want to be sure that you aren't selecting a region of memory that is actually reserved for another purpose. Here is one from Dr. Dobb's Journal:

Address (Hex)                 Memory Usage

0000:0000                Interupt vector table
0040:0000                ROM BIOS data area
0050:0000                DOS parameter area
0070:0000                IBMBIO.COM / IO.SYS *
mmmm:mmmm                BMDOS.COM / MSDOS.SYS *
mmmm:mmmm                CONFIG.SYS - specified information
                         (device drivers and internal buffers
mmmm:mmmm                Resident COMMAND.COM
mmmm:mmmm                Master environment
mmmm:mmmm                Environment block #1
mmmm:mmmm                Application program #1
     .                        .      .                        .      .                        .
mmmm.mmmm                Environment block #n
mmmm:mmmm                Application #n
xxxx:xxxx                Transient COMMAND.COM
A000:0000                Video buffers and ROM
FFFF:000F                Top of 8086 / 88 address space

The "official" memory allocation mechanism is through memory control blocks (MCB) and the DOS interrupt 0x21 using 0x48 to allocate and 0x49 to free memory. A good discussion of this can be found in this Microsoft support document.

For documentation on the interrupt approach, you might look here.

like image 114
David Hoelzer Avatar answered Sep 30 '22 14:09

David Hoelzer


I recently stumbled on this question. Despite it being a few years old, I felt that some additional information beyond the current answers may be useful for future readers.


This question really boils down to: Can I arbitrarily write to memory beyond the extent of my program that DOS has allocated? The question is geared towards DOS COM programs, but much of the information applies to DOS EXE programs as well.

GNU assembler is limited in that it doesn't generate 16-bit DOS EXE programs so you have to generate DOS COM programs. DOS COM programs that have an origin point of 0x100. The code, data and stack can't exceed 64KiB of memory (at load time). DOS COM programs have these characteristics once loaded into memory by the DOS loader:

  • Upon entry DS=ES=SS=CS.
  • The program is relocatable to any segment, and doesn't contain load time fixups/relocations.
  • The program is allocated the largest contiguous free block from the DOS memory pool even though the DOS COM program when loaded is limited to <= 64KiB of memory. The DOS loader effectively allocates the entire free pool to your COM program.
  • The DOS loader always sets SS=CS, but SP may start at a value other than 0x00001 if the amount of available space for our program is less than 64KiB.
  • The DOS loader always pushes a value of 0x0000 onto the stack prior to transferring control to CS:0x0100 to start our program. CS:0x0000 is the start of the PSP and the PSP starts with the 2 byte instruction (0xcd 0x20) Int 20h. Int 20h terminates the current program. This is the mechanism that allows a DOS COM program to do a ret to terminate the program.
  • There is a program control block called the Program Segment Prefix (PSP) that DOS places in memory between CS:0x0000 and CS:0x0100
  • COM programs start executing at CS:0x0100

The first question one should ask is: How much memory does my DOS COM program actually have? The simple answer is: it varies. It may vary based on the amount of available conventional memory (IBM PC's generally came with 64KiB, 128KiB, 256KiB, 512KiB, or 640KiB). The Dr. Dobbs Journal article cited in another answer was published in 1988 and the memory map is missing some crucial things.

In 1987 IBM released the IBM PS/2 line of computers. In order to save mouse related information, IBM realized there wasn't enough space in the BIOS Data Area above the interrupt vector table so they created an Extended BIOS Data Area (EBDA). This memory is reserved by the BIOS, and the IBM PS/2 BIOS started reporting 1KiB less memory (639KiB instead of 640KiB). The EBDA can be of varying sizes depending on the BIOS manufacturer. The BIOS Int 12h call will return the amount of conventional memory (<=640KiB) excluding the EBDA region. DOS relies on this to determine how much memory it has available to use.

To make things worse, when the 386SL based systems were released it included System Management Mode that runs at ring -2 and had complete access to your PC. These systems started using space in the EBDA as well. Some systems required more than 1KiB. In theory you could have 128KiB of EBDA space although I'm not sure if any systems ever had that! This area was eventually used for power management (APM), ACPI, SMBIOS and it was possible for this area to be written to by System Management Mode at anytime. This area is often considered reserved by OSes for that reason. What actually happens is dependent on the BIOS and the machine's manufacturer.

Beyond the EBDA some DOS programs (and malware) intercept BIOS Int 12h and report less memory in order to hide (or make resident) a piece of code/data that DOS shouldn't touch. The Dr. Dobbs memory map could use a couple of additions:

mmmm:mmmm                Environment block #1
mmmm:mmmm                Application program #1
     .                        .      .                        .      . 
mmmm.mmmm                Environment block #n
mmmm:mmmm                Application #n
xxxx:xxxx                Transient COMMAND.COM
hhhh:hhhh                Hidden/Resident programs and data
eeee:eeee                Extended BIOS Data Area
A000:0000                Video buffers and ROM
FFFF:000F                Top of 8086 / 88 address space

Moral of the story: you shouldn't assume the amount of memory available to you runs between CS:0x0000 and 0xa000:0x00002.

To answer the question as to how to tell what region of memory is exclusive to your program can be answered by looking at the PSP, and in particular the WORD value at offset CS:0x0002:

02h-03h word (2 bytes) Segment of the first byte beyond the memory allocated to the program

By reading this value you can get the segment of the first byte just beyond what your program has been allocated (we'll call it NEXTSEG). Often NEXTSEG will be 0xA000 or 0x9FC0 (a system with a 1KiB EBDA would have this value). It will vary on hardware for the reasons discussed previously. The area will overlap the transient portion of MS-DOS's COMMAND.COM. Realistically, the area of memory that we can guarantee exclusive to our COM program after being loaded is that we are free to use all physical memory between CS:0x0000 and NEXTSEG:0x0000.


COM program allocating 128KiB

Because of the overlapping nature of 20-bit segment:offset addressing each segment points to the start of a different 16-byte region in memory called a paragraph. Incrementing a segment by 1 advances 16 bytes in memory and decrementing goes back 16 bytes. This is important in doing the required arithmetic to find out how much our program needs and ensuring enough memory is available to satisfy the request.

128KiB is 128*1024/16=8192 paragraphs. The actual size of the region our COM program was loaded into (and where the stack is placed) is bounded by CS:0x0000 and the segment just beyond where the stack (SP) is pointing. Since DOS always pushes a 2 byte value (return address that ret will return to) for a COM program - the next paragraph can be computed by dividing SP by 16 (or SHR by 4) and adding 1 (we'll call this SEGAFTERSTACK).

The easiest thing to do is to place our 128KiB of data just beyond the upper edge of the stack (SEGAFTERSTACK). We just have to ensure that there is enough space between SEGAFTERSTACK and NEXTSEG (the extent of our program area given to us by DOS). If that value is >=8192 paragraphs then we have enough memory and we are free to access it as we see fit. If we do have enough memory we can ask DOS to resize our COM program down to the exact amount of space we need using Int 21h/AH=4ah. We don't need to resize the memory DOS already allocated for us but it can be useful if your code needs to load/run a child program with DOS's Exec function Int 21h/AH=4bh.

Note: DOS < 2.0 didn't support Memory Control Blocks which meant the Int 21h functions to allocate, free, and resize are unavailable. Calling them on DOS < 2.0 will fail silently. When resizing reduces a program's size in memory the function shouldn't fail so we should be able to ignore any errors.

A version of the program using GNU assembler that ensures we have 128KiB of free space for our program after the stack could look like this:

EXTRA_SIZE      = 128*1024     # Allocate 128KiB above stack
PARA_SIZE       = 16           # A paragraph = 16 bytes
EXTRA_SIZE_PARA = (EXTRA_SIZE+PARA_SIZE-1)/PARA_SIZE
                               # Extra Size in Paragraphs
COM_ORG         = 0x100        # Origin point for COM  program is 0x100

.code16
.global _start
.section .text

_start:
    # In a COM program CS=DS=ES=SS=0x0000. IP=0x100. The PSP is a 0x100 byte structure
    # between CS:0x0000 and CS:0x0100. DOS allocates the largest free block of
    # contiguous conventional memory from the DOS memory pool to our COM program. 
    # SS:SP grows down from the last paragraph allocated to us OR the top of the
    # 64kb segment, whichever is lower.
    #
    # At (DS:[0x0002]) is the segment (NEXTSEG) of the first byte  beyond the memory
    # allocated to our program. This means our program has been allocated all memory
    # between CS:0x0000 and NEXTSEG:0x0000

    # Get the next segment just above the top of the stack
    mov %sp, %bp               # BP = Current stack pointer
    mov $4, %cl                # Compute the segment just above top of stack
                               # Where extra data will be placed
    shr %cl, %bp               #     Divide BP by 16
    inc %bp                    #     and add 1

    # Compute a new program size including extra data area we want and
    # place it above the stack
    lea EXTRA_SIZE_PARA(%bp), %bx
                               # BX = Size (paragraphs) of Code/Data+Stack+Extra Data
    mov 0x0002, %ax            # Get the segment above last allocated
                               #     paragraph of our program from PSP @ [DS:0002]
    sub %bx, %ax               # Do we have enough memory for the extra data?
    jb .no_mem                 #     If not  display memory error and exit
    mov $0x4a, %ah             # Request DOS resize our program's memory block
    int $0x21                  #     to exactly the # of paragraphs we need.
    push %cs
    pop %bx                    # BX = CS (first segment of our program)
    add %bx, %bp               # BP = segment at the start of our extra data

    # Do stuff. Just an example:
    lea 0x0000(%bp), %si       # SI=segment of first 64KiB segment we allocated
    lea 0x1000(%bp), %di       # DI=segment of second 64KiB segment we allocated

    jmp .exit

.no_mem:
    mov $no_mem_str, %dx       # Have DOS print an error and exit.
    mov $9, %ah
    int $0x21

.exit:
    ret                        # We're done

no_mem_str: .asciz "Out of memory\n\r$"

_end:

A slightly more complex variant is to resize the stack we were given by default to a size that is suitable for our work, and then place the 128KiB of extra data after the stack. We need to compute the extent of our code and data to place the stack just beyond it, followed by the memory for the 128KiB of data. This code does just that using a 4096 byte stack:

STACK_SIZE = 4096              # Stack size = 4KiB
EXTRA_SIZE = 128*1024          # Allocate 128KiB above stack
PARA_SIZE  = 16                # A paragraph = 16 bytes
COM_ORG    = 0x100             # Origin point for COM  program is 0x100

.code16
.global _start
.section .text

_start:
    # In a COM program CS=DS=ES=SS=0x0000. IP=0x100. The PSP is a 0x100 byte structure
    # between CS:0x0000 and CS:0x0100. DOS allocates the largest free block of
    # contiguous conventional memory from the DOS memory pool to our COM program. 
    # SS:SP grows down from the last paragraph allocated to us OR the top of the
    # 64kb segment, whichever is lower.

    # At (DS:[0x0002]) is the segment (NEXTSEG) of the first byte  beyond the memory
    # allocated to our program. This means our program has been allocated all memory
    # between CS:0x0000 and NEXTSEG:0x0000
    
    push %ds
    pop %cx                    # CX = Segment at start of our program
    mov %cx, %bp               # BP = A copy (for later) of program starting segment
    mov $PROG_SIZE_PARA, %bx   # BX = number of paragraphs of EXTRA memory to allocate 
    add %bx, %cx               # CX = total number of paragraphs our program needs
    mov 0x0002, %ax            # AX = next segment past end of our program
                               #     retrieved from our program's PSP @ [DS:0002]
    sub %cx, %ax               # Do we have enough memory to satisfy the request?
    jb .no_mem                 #     If not  display memory error and exit
    mov $0x4a, %ah             # Request DOS resize our programs memory block
    int $0x21                  #     to exactly the # of paragraphs we need.

    mov $STACK_TOP_OFS, %sp    # Place the stack after non-BSS code and data
                               #     and before the BSS (Extra) memory
    xor %ax, %ax               # Push a 0x0000 return address as DOS does for us
    push %ax                   #     when initializing our program. Memory address
                               #     CS:0x0000 contains an Int 20h instruction to exit
    add $EXTRA_SEG, %bp        # BP = segment where our extra data areas starts

    # Do stuff. Just an example:    
    lea 0x0000(%bp), %si       # SI=segment of first 64KiB segment we allocated
    lea 0x1000(%bp), %di       # DI=segment of second 64KiB segment we allocated

    jmp .exit

.no_mem:
    mov $no_mem_str, %dx       # Have DOS print an error and exit.
    mov $9, %ah
    int $0x21

.exit:
    ret                        # We're done

no_mem_str: .asciz "Out of memory\n\r$"

_end:

# Length of non-BSS Code and Data
CODE_DATA_LEN   = _end-_start

# Segment number after the PSP/code/non-BSS data/stack relative to start of program
EXTRA_SEG       = (CODE_DATA_LEN+COM_ORG+STACK_SIZE+PARA_SIZE-1)/PARA_SIZE

# Size of the total program in paragraphs
PROG_SIZE_PARA  = EXTRA_SEG+EXTRA_SIZE_PARA

# New Stack offset(SP) will be moved just below extra data
STACK_TOP_OFS   = EXTRA_SEG*PARA_SIZE

# Size of the extra memory region in paragraphs
EXTRA_SIZE_PARA = (EXTRA_SIZE+PARA_SIZE-1)/PARA_SIZE

These samples can be assembled and linked to a program called myprog.com with:

as --32 myprog.s -o myprog.o
ld -melf_i386 -Ttext=0x100 --oformat=binary myprog.o -o myprog.com

Allocating 128KiB in a DOS EXE Program

The DOS loader also loads EXE programs (they have an MZ header). The MZ header contains program information, relocation tables, stack, entry point, and minimum and maximum memory allocation requirements beyond the data physically present in the executable file. Segments with entirely uninitialized data (including but not limited to BSS and Stack segments) don't occupy space in the executable file, but the DOS loader is told to allocate extra memory through the MINALLOC and MAXALLOC header fields:

MINALLOC. This word indicates the minimum number of paragraphs the program requires to begin execution. This is in addition to the memory required to hold the load module. This value normally represents the total size of any uninitialised data and/or stack segments that are linked at the end of a program. This space is not directly included in the load module, since there are no particular initialising values and it would simply waste disk space.

MAXALLOC. This word indicates the maximum number of paragraphs that the program would like allocated to it before it begins execution. This indicates additional memory over and above that required by the load module and the value specified by MINALLOC. If the request cannot be satisfied, the program is allocated as much memory as is available

MINALLOC is the number of paragraphs above the code and data in the EXE itself that are required. MAXALLOC is always at least equal to MINALLOC but if (MAXALLOC > MINALLOC) then DOS will attempt to fulfill the request for the additional paragraphs (MAXALLOC-MINALLOC). If that request can't be honoured then DOS will allocate all the available space it does have. Often the extra memory between MAXALLOC and MINALLOC is called the HEAP by many tools and programming languages.

It is worth noting that it is the final linking process that generates the executable that sets MINALLOC and MAXALLOC. Often the linker by default sets MAXALLOC to 0xffff effectively requesting the HEAP take up as much contiguous space as DOS can allocate. The EXEMOD program was designed to allow this to be changed:

EXEMOD

EXEMOD displays or changes fields in the DOS file header. To use this utility, you must understand the DOS conventions for file header

[snip]

/MIN n Sets the minimum allocation value to n, where n is a hexadecimal value setting the number of paragraphs. The actual value set may be different from the requested value if adjustments are necessary to accommodate the stack.

/MAX n

Sets the maximum allocation to n, where n is a hexadecimal value setting the number of paragraphs. The maximum allocations value must be greater than or equal to the minimum allocation value. This option has the same effect as the linker parameter ICPARMAXALLOC.

In DOS < 2.0 that didn't have the concept of memory control blocks, using EXEMOD was the method to change the additional memory requirements of a DOS executable. In DOS 2.0+ a program (at run-time) can allocate new memory, resize memory, and free memory through DOS Int 21h functions.

For this discussion the 128KiB of extra memory is required by the program so the examples will place that data in the uninitialised data. The linking/executable generation process will adjust MINALLOC field in the MZ header by adding the extra paragraphs needed.

The first example of a DOS program that wishes to allocate 128KiB (two 64KiB segments placed one after the other) is written in FASM assembly:

format MZ                      ; DOS EXE Program

stack 4096                     ; 4KiB stack. FASM puts stack after BSS data

entry code:main                ; Program entry point (seg:offset)

segment code
main:
    push ds
    pop ax
    mov bx, EndSeg
    sub bx, ax                 ; BX = size of program in paragraphs (EndSeg-DS)
    mov ah, 4ah                ; Resize to the number of paragraphs we need
    int 21h                    ;     because the DOS loader sometimes allocates slightly
                               ;     more than our actual program requirements

    ; Do Stuff. Just an example:    
    mov si, ExtraSeg1          ; SI=segment of first 64KiB segment we allocated
    mov di, ExtraSeg2          ; DI=segment of second 64KiB segment we allocated

    mov ax, 4c00h              ; We're done, have DOS exit and return 0
    int 21h

segment ExtraSeg1
rb 65536                       ; Reserve 65536 uninitialized "bytes" in BSS area

segment ExtraSeg2
rb 65536                       ; Reserve 65536 uninitialized "bytes" in BSS area

segment EndSeg                 ; Use this segment to determine last segment of our program
                               ;     Segments with no data will be put in BSS after
                               ;     other BSS segments

A version that should work with most releases of MASM/JWASM/TASM would look like:

.model compact, C              ; Multiple data segments, one code segment
.stack 4096                    ; 4KiB stack

; fardata? are uninitialized segments (like BSS)
.fardata? ExtraSeg1            ; Allocate first 64KiB in a new far segment
db 65535 DUP(?)                ; Some old assemblers don't support 65536! Set to 65535
                               ; The next segment will be aligned to a paragraph boundary
                               ; Uninitialized data `?` will not be physically in our EXE

.fardata? ExtraSeg2            ; Allocate second 64KiB in a new far segment after first
db 65535 DUP(?)                ; Some old MASM assemblers don't support 65536! Set to 65535
                               ; The next segment will be aligned to a paragraph boundary
                               ; Uninitialized data `?` will not be physically in our EXE


.fardata? EndSeg               ; Use this segment to determine last segment of our program
                               ;     Segments with no data will be put in BSS after
                               ;     other BSS segments
.code
main PROC
    push ds
    pop ax
    mov bx, EndSeg
    sub bx, ax                 ; BX = size of program in paragraphs (EndSeg-DS)
    mov ah, 4ah                ; Resize to the number of paragraphs we need
    int 21h                    ;     because the DOS loader sometimes will allocate 
                               ;     slightly more than our actual program requirements

    ; Do Stuff. Just an example:
    mov si, ExtraSeg1          ; SI=segment of first 64KiB segment we allocated
    mov di, ExtraSeg2          ; DI=segment of second 64KiB segment we allocated

    mov ax, 4c00h              ; We're done, have DOS exit and return 0
    int 21h
main ENDP

END main                       ; Program entry point is main

Footnotes:

  • 1When there is less than 64KiB of free memory available left to DOS, SP will be set to grow down from an offset below the top of DOS's available free memory. When there is 64KiB or more of free memory available the DOS loader sets SP to 0x0000. In the case of >= 64KiB of free memory available the first push of data (the return address 0x0000) wraps SP to the top of the segment at 0xfffe (0x0000-2). This is a real mode quirk: if you set SS:SP to SS:0x0000 the first value pushed will be placed at SS:0xFFFE at the top of the SS segment.
  • 2Although 0xa000:0x0000 is often seen as the upper end of the contiguous conventional memory usable by DOS it doesn't necessarily have to be that way. Some memory managers (JEMMEX, QEMM, 386Max to name a few) and their tools that can successfully move the EBDA (on equipment where that doesn't cause problems) and can be told that the VGA/EGA memory at 0xa000:0x0000 to 0xa000:0xffff is unused can move the upper end of contiguous memory for DOS allocations to 0xb000:0x0000. It is even possible in a headless (no video) configuration to have even more. A 386 memory manager that does this usually runs DOS in v8086 mode and remaps extended memory (using the 386's support for paging) into the unused areas between 0xa000:0x0000 and 0xf000:0xffff.
like image 33
Michael Petch Avatar answered Sep 30 '22 14:09

Michael Petch