Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do I get triple fault when trying to handle an exception on 286 but not on a modern CPU nor Bochs?

I'm trying to initialize protected mode with exception handling on an AMD 286 system. I've debugged the code below on Bochs, and it works fine there. So does it when run on a Pentium 4 machine. But on the 286 it simply triple faults when it gets to the int3 instruction. The observable behavior is: if I comment out int3, I get the "OK" showed on the screen indefinitely, while with the code as is, the system reboots.

The code is to be compiled by FASM, and the binary put into a boot sector of an HDD or FDD. I'm actually running it from a 1.4M floppy.

 org 0x7c00
 use16

 CODE_SELECTOR     = code_descr - gdt
 DATA_SELECTOR     = data_descr - gdt

    ; print "OK" on the screen to see that we've actually started
    push     0xb800
    pop      es
    xor      di,di
    mov      ax, 0x0700+'O'
    stosw
    mov      ax, 0x0700+'K'
    stosw
    ; clear the rest of the screen
    mov      cx, 80*25*2-2
    mov      ax, 0x0720
    rep stosw

    lgdt     [cs:gdtr]
    cli
    smsw     ax
    or       al, 1
    lmsw     ax
    jmp      CODE_SELECTOR:enterPM
enterPM:
    lidt     [idtr]
    mov      cx, DATA_SELECTOR
    mov      es, cx
    mov      ss, cx
    mov      ds, cx

    int3     ; cause an exception
    jmp      $

intHandler:
    jmp      $

gdt:
    dq       0
data_descr:
    dw       0xffff     ; limit
    dw       0x0000     ; base 15:0
    db       0x00       ; base 23:16
    db       10010011b  ; present, ring0, non-system, data, extending upwards, writable, accessed
    dw       0          ; reserved on 286
code_descr:
    dw       0xffff     ; limit
    dw       0x0000     ; base 15:0
    db       0x00       ; base 23:16
    db       10011011b  ; present, ring0, non-system, code, non-conforming, readable, accessed
    dw       0          ; reserved on 286

gdtr:
    dw       gdtr-gdt-1
 gdtBase:
    dd       gdt

idt:
 rept 14
 {
    dw       intHandler
    dw       CODE_SELECTOR
    db       0
    db       11100111b    ; present, ring3, system, 16-bit trap gate
    dw       0            ; reserved on 286
 }
idtr:
    dw       idtr-idt-1
 idtBase:
    dd       idt

finish:
    db       (0x7dfe-finish) dup(0)
    dw       0xaa55

I suppose I'm using some CPU feature that the 286 doesn't support, but what exactly and where?

like image 560
Ruslan Avatar asked Aug 17 '19 07:08

Ruslan


People also ask

How is a triple fault caused?

In modern operating systems, a triple fault is typically caused by a buffer overflow or underflow in a device driver which writes over the interrupt descriptor table (IDT).

What is double fault in OS?

A double fault occurs when there is a fault, but the processor cannot successfully execute to completion the first instruction of the handler for the primary fault; this causes the processor to switch to running the first instruction of the double-fault handler.


1 Answers

  • In your protected mode code you have:

    lidt     [idtr]
    mov      cx, DATA_SELECTOR
    mov      es, cx
    mov      ss, cx
    mov      ds, cx
    

    This relies on DS being set to 0x0000 prior to entering protected mode (and the corresponding base address being 0 in the DS descriptor cache) prior to doing lidt [idtr]. That instruction has an implicit DS segment. Place the lidt instruction after you set the segment registers with 16-bit selectors, not before.

  • Although it didn't manifest itself as a bug on your hardware, in real mode your code also relies on CS being set to 0x0000 for the instruction lgdt [cs:gdtr]. CS being 0x0000 isn't guaranteed as it is very possible for some BIOSes to use a non zero CS to reach your bootloader. For example 0x07c0:0x0000 would also reach physical address 0x07c00 (0x07c0<<4+0x0000=0x07c00). In the real mode code I'd recommend setting DS to zero and using lgdt [gdtr].

  • Once in protected mode and before using the stack you should set SP. Interrupts will require the stack pointer being somewhere valid. Initializing it to 0x0000 would have the stack grow down from the top of the 64KiB segment. You shouldn't rely on it happening to point somewhere that won't interfere with your running system once in protected mode (ie. on top of your bootloader code/data).

  • Before using any of the string instructions like STOS/SCAS/CMPS/LODS you should ensure that the Direction Flag is set as you expect it. Since you rely on forward movement you should clear the Direction Flag with CLD. You shouldn't assume that the Direction Flag is clear upon entry to your bootloader.

Many of these issues are captured in my General Bootloader Tips in another Stackoverflow answer.

like image 156
Michael Petch Avatar answered Oct 28 '22 20:10

Michael Petch