Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mmap() an entire large file

Tags:

c

mmap

I am trying to "mmap" a binary file (~ 8Gb) using the following code (test.c).

#include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <sys/mman.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h>  #define handle_error(msg) \   do { perror(msg); exit(EXIT_FAILURE); } while (0)  int main(int argc, char *argv[]) {    const char *memblock;    int fd;    struct stat sb;     fd = open(argv[1], O_RDONLY);    fstat(fd, &sb);    printf("Size: %lu\n", (uint64_t)sb.st_size);     memblock = mmap(NULL, sb.st_size, PROT_WRITE, MAP_PRIVATE, fd, 0);    if (memblock == MAP_FAILED) handle_error("mmap");     for(uint64_t i = 0; i < 10; i++)    {      printf("[%lu]=%X ", i, memblock[i]);    }    printf("\n");    return 0; } 

test.c is compiled using gcc -std=c99 test.c -o test and file of test returns: test: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped

Although this works fine for small files, I get a segmentation fault when I try to load a big one. The program actually returns:

Size: 8274324021  mmap: Cannot allocate memory 

I managed to map the whole file using boost::iostreams::mapped_file but I want to do it using C and system calls. What is wrong with my code?

like image 664
Emer Avatar asked Aug 28 '11 16:08

Emer


People also ask

What are the advantages of using mmap ()?

Advantages of mmap( ) Aside from any potential page faults, reading from and writing to a memory-mapped file does not incur any system call or context switch overhead. It is as simple as accessing memory. When multiple processes map the same object into memory, the data is shared among all the processes.

How do I resize mmap?

The simple way is to use two MAP_SHARED mappings (grow the file, then create a second mapping that includes the grown region) in the same process over the same file and then unmap the old mapping once all readers that could access it are finished.

Why is mmap faster than read?

What mmap helps with is that there is no extra user space buffer involved, the "read" takes place there where the OS kernel sees fit and in chunks that can be optimized. This may be an advantage in speed, but first of all this is just an interface that is easier to use.

Does mmap load file into memory?

Yes, mmap creates a mapping. It does not normally read the entire content of whatever you have mapped into memory. If you wish to do that you can use the mlock/mlockall system call to force the kernel to read into RAM the content of the mapping, if applicable.


2 Answers

MAP_PRIVATE mappings require a memory reservation, as writing to these pages may result in copy-on-write allocations. This means that you can't map something too much larger than your physical ram + swap. Try using a MAP_SHARED mapping instead. This means that writes to the mapping will be reflected on disk - as such, the kernel knows it can always free up memory by doing writeback, so it won't limit you.

I also note that you're mapping with PROT_WRITE, but you then go on and read from the memory mapping. You also opened the file with O_RDONLY - this itself may be another problem for you; you must specify O_RDWR if you want to use PROT_WRITE with MAP_SHARED.

As for PROT_WRITE only, this happens to work on x86, because x86 doesn't support write-only mappings, but may cause segfaults on other platforms. Request PROT_READ|PROT_WRITE - or, if you only need to read, PROT_READ.

On my system (VPS with 676MB RAM, 256MB swap), I reproduced your problem; changing to MAP_SHARED results in an EPERM error (since I'm not allowed to write to the backing file opened with O_RDONLY). Changing to PROT_READ and MAP_SHARED allows the mapping to succeed.

If you need to modify bytes in the file, one option would be to make private just the ranges of the file you're going to write to. That is, munmap and remap with MAP_PRIVATE the areas you intend to write to. Of course, if you intend to write to the entire file then you need 8GB of memory to do so.

Alternately, you can write 1 to /proc/sys/vm/overcommit_memory. This will allow the mapping request to succeed; however, keep in mind that if you actually try to use the full 8GB of COW memory, your program (or some other program!) will be killed by the OOM killer.

like image 64
bdonlan Avatar answered Sep 16 '22 15:09

bdonlan


Linux (and apparently a few other UNIX systems) have the MAP_NORESERVE flag for mmap(2), which can be used to explicitly enable swap space overcommitting. This can be useful when you wish to map a file larger than the amount of free memory available on your system.

This is particularly handy when used with MAP_PRIVATE and only intend to write to a small portion of the memory mapped range, since this would otherwise trigger swap space reservation of the entire file (or cause the system to return ENOMEM, if system wide overcommitting hasn't been enabled and you exceed the free memory of the system).

The issue to watch out for is that if you do write to a large portion of this memory, the lazy swap space reservation may cause your application to consume all the free RAM and swap on the system, eventually triggering the OOM killer (Linux) or causing your app to receive a SIGSEGV.

like image 31
dcoles Avatar answered Sep 16 '22 15:09

dcoles