Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does a read operation on a memory mapped zero byte file lead to SIGBUS?

Tags:

c

mmap

sigbus

Here is the example code I wrote.

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>

int main()
{
    int fd;
    long pagesize;
    char *data;

    if ((fd = open("foo.txt", O_RDONLY)) == -1) {
        perror("open");
        return 1;
    }

    pagesize = sysconf(_SC_PAGESIZE);
    printf("pagesize: %ld\n", pagesize);

    data = mmap(NULL, pagesize, PROT_READ, MAP_SHARED, fd, 0);
    printf("data: %p\n", data);
    if (data == (void *) -1) {
        perror("mmap");
        return 1;
    }

    printf("%d\n", data[0]);
    printf("%d\n", data[1]);
    printf("%d\n", data[2]);
    printf("%d\n", data[4096]);
    printf("%d\n", data[4097]);
    printf("%d\n", data[4098]);

    return 0;
}

If I provide a zero byte foo.txt to this program, it terminates with SIGBUS.

$ > foo.txt && gcc foo.c && ./a.out 
pagesize: 4096
data: 0x7f8d882ab000
Bus error

If I provide a one byte foo.txt to this program, then there is no such issue.

$ printf A > foo.txt && gcc foo.c && ./a.out 
pagesize: 4096
data: 0x7f5f3b679000
65
0
0
48
56
10

mmap(2) mentions the following.

Use of a mapped region can result in these signals:

SIGSEGV Attempted write into a region mapped as read-only.

SIGBUS Attempted access to a portion of the buffer that does not correspond to the file (for example, beyond the end of the file, including the case where another process has truncated the file).

So if I understand this correctly, even the second test case (1-byte file) should have led to SIGBUS because data[1] and data[2] are trying to access a portion of the buffer (data) that does not correspond to the file.

Can you help me understand why only a zero byte file causes this program to fail with SIGBUS?

like image 945
Lone Learner Avatar asked Jan 01 '17 13:01

Lone Learner


People also ask

What does it mean when a file says zero bytes?

For a file, this means the file is empty. For a drive, this means the drive is empty. However, just because it says 0 bytes doesn't mean the file, folder, or disk is actually empty. It could be an incorrect reading due to a problem with the drive.

What is the purpose of memory mapped file?

A memory-mapped file contains the contents of a file in virtual memory. This mapping between a file and memory space enables an application, including multiple processes, to modify the file by reading and writing directly to the memory.

Are zero byte files allowed?

They do not contain any data, so they cannot be opened and should be deleted (unless they have a purpose, like if they were intentionally created and are still relevant to your case). However, some systems may be unable to delete zero-byte files, since they may be corrupt.

Why is memory mapped file faster?

Performance: Memory mapped writing is often fast as no stream/file buffers are used. OS does the actual file writing, usually in blocks of several kilo bytes at once. One downside would be, unless you're writing sequentially there could be page faults slowing down your program.


2 Answers

You get SIGBUS when accessing past the end of last whole mapped page because the POSIX standard states:

The mmap() function can be used to map a region of memory that is larger than the current size of the object. Memory access within the mapping but beyond the current end of the underlying objects may result in SIGBUS signals being sent to the process.

With a zero-byte file, the entire page you mapped is "beyond the current end of the underlying object". So you get SIGBUS.

You do NOT get a SIGBUS when you go beyond the 4kB page you've mapped because that's not within your mapping. You don't get a SIGBUS accessing your mapping when your file is larger than zero bytes because the entire page gets mapped.

But you would get a SIGBUS if you mapped additional pages past the end of the file, such as mapping two 4kB pages for a 1-byte file. If you access that second 4kB page, you'd get SIGBUS.

like image 91
Andrew Henle Avatar answered Oct 21 '22 16:10

Andrew Henle


A 1-byte file does not lead to the crash because mmap will map memory in multiples of the page size and zero the remainder. From the man page:

A file is mapped in multiples of the page size. For a file that is not a multiple of the page size, the remaining memory is zeroed when mapped, and writes to that region are not written out to the file. The effect of changing the size of the underlying file of a mapping on the pages that correspond to added or removed regions of the file is unspecified.

like image 30
milgner Avatar answered Oct 21 '22 16:10

milgner