Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When does memory load cause bus error on x86-64 linux?

I used to think that x86-64 supports unaligned memory access and invalid memory access always causes segmentation fault (except, perhaps, SIMD instructions like movdqa or movaps). Nevertheless recently I observed bus error with normal mov instruction. Here is a reproducer:

void test(void *a)
{
    asm("mov %0, %%rbp\n\t"
        "mov 0(%%rbp), %%rdx\n\t"
        : : "r"(a) : "rbp", "rdx");
}

int main()
{
    test((void *)0x706a2e3630332d69);
    return 0;
}

(must be compiled with frame pointer omission, e.g. gcc -O test.c && ./a.out).

mov 0(%rbp), %rdx instruction and the address 0x706a2e3630332d69 were copied from a coredump of the buggy program. Changing it to 0 causes segfault, but just aligning to 0x706a2e3630332d60 is still bus error (my guess is that it is related to the fact that address space is 48-bit on x86-64).

The question is: which addresses cause bus error (SIGBUS)? Is it determined by architecture or configured by OS kernel (i.e. in page table, control registers or something similar)?

like image 858
Mikhail Maltsev Avatar asked Nov 30 '22 16:11

Mikhail Maltsev


1 Answers

SIGBUS is in a sad state. There's no consensus between different operating systems what it should mean and when it is generated varies wildly between operating systems, cpu architectures, configuration and the phase of the moon. Unless you work with a very specific configuration you should just treat it "just like SIGSEGV, but different".

I suspect that originally it was supposed to mean "you tried a memory access that could not possibly be successful no matter what the kernel does", so in other words the exact bit pattern you have in the address can never be a valid memory access. Most commonly this would mean unaligned access on strict alignment architectures. Then some systems started using it for accesses to virtual address space that doesn't exist (like in your example, the address you have can't exist). Then by accident some systems made it also mean that userland tried to touch kernel memory (since at least technically it's virtual address space that doesn't exist from the point of view of userland). Then it became just random.

Other than that I've seen SIGBUS from:

  • access to non-existent physical address from mmap:ed hardware.
  • exec of non-exec mapping
  • access to perfectly valid mapping, but overcommitted memory couldn't be faulted in at this moment (I've seen SIGSEGV, SIGKILL and SIGBUS here, at least one operating system does this differently depending on which architecture you're on).
  • memory management deadlocks (and other "something went horribly wrong, but we don't know what" memory management errors).
  • stack red zone access
  • hardware errors (ECC memory, pci bus parity errors, etc.)
  • access to mmap:ed file where the file contents don't exist (past the end of the file or a hole).
  • access to mmap:ed file where the file contents should exist, but don't (I/O errors).
  • access to normal memory that got swapped out and swap in couldn't be performed (I/O error).
like image 131
Art Avatar answered Dec 05 '22 07:12

Art