I am trying to "mmap" a binary file (~ 8Gb) using the following code (test.c).
#include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <sys/mman.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #define handle_error(msg) \ do { perror(msg); exit(EXIT_FAILURE); } while (0) int main(int argc, char *argv[]) { const char *memblock; int fd; struct stat sb; fd = open(argv[1], O_RDONLY); fstat(fd, &sb); printf("Size: %lu\n", (uint64_t)sb.st_size); memblock = mmap(NULL, sb.st_size, PROT_WRITE, MAP_PRIVATE, fd, 0); if (memblock == MAP_FAILED) handle_error("mmap"); for(uint64_t i = 0; i < 10; i++) { printf("[%lu]=%X ", i, memblock[i]); } printf("\n"); return 0; }
test.c is compiled using gcc -std=c99 test.c -o test
and file
of test returns: test: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped
Although this works fine for small files, I get a segmentation fault when I try to load a big one. The program actually returns:
Size: 8274324021 mmap: Cannot allocate memory
I managed to map the whole file using boost::iostreams::mapped_file but I want to do it using C and system calls. What is wrong with my code?
Advantages of mmap( ) Aside from any potential page faults, reading from and writing to a memory-mapped file does not incur any system call or context switch overhead. It is as simple as accessing memory. When multiple processes map the same object into memory, the data is shared among all the processes.
The simple way is to use two MAP_SHARED mappings (grow the file, then create a second mapping that includes the grown region) in the same process over the same file and then unmap the old mapping once all readers that could access it are finished.
What mmap helps with is that there is no extra user space buffer involved, the "read" takes place there where the OS kernel sees fit and in chunks that can be optimized. This may be an advantage in speed, but first of all this is just an interface that is easier to use.
Yes, mmap creates a mapping. It does not normally read the entire content of whatever you have mapped into memory. If you wish to do that you can use the mlock/mlockall system call to force the kernel to read into RAM the content of the mapping, if applicable.
MAP_PRIVATE
mappings require a memory reservation, as writing to these pages may result in copy-on-write allocations. This means that you can't map something too much larger than your physical ram + swap. Try using a MAP_SHARED
mapping instead. This means that writes to the mapping will be reflected on disk - as such, the kernel knows it can always free up memory by doing writeback, so it won't limit you.
I also note that you're mapping with PROT_WRITE
, but you then go on and read from the memory mapping. You also opened the file with O_RDONLY
- this itself may be another problem for you; you must specify O_RDWR
if you want to use PROT_WRITE
with MAP_SHARED
.
As for PROT_WRITE
only, this happens to work on x86, because x86 doesn't support write-only mappings, but may cause segfaults on other platforms. Request PROT_READ|PROT_WRITE
- or, if you only need to read, PROT_READ
.
On my system (VPS with 676MB RAM, 256MB swap), I reproduced your problem; changing to MAP_SHARED
results in an EPERM
error (since I'm not allowed to write to the backing file opened with O_RDONLY
). Changing to PROT_READ
and MAP_SHARED
allows the mapping to succeed.
If you need to modify bytes in the file, one option would be to make private just the ranges of the file you're going to write to. That is, munmap
and remap with MAP_PRIVATE
the areas you intend to write to. Of course, if you intend to write to the entire file then you need 8GB of memory to do so.
Alternately, you can write 1
to /proc/sys/vm/overcommit_memory
. This will allow the mapping request to succeed; however, keep in mind that if you actually try to use the full 8GB of COW memory, your program (or some other program!) will be killed by the OOM killer.
Linux (and apparently a few other UNIX systems) have the MAP_NORESERVE
flag for mmap(2), which can be used to explicitly enable swap space overcommitting. This can be useful when you wish to map a file larger than the amount of free memory available on your system.
This is particularly handy when used with MAP_PRIVATE
and only intend to write to a small portion of the memory mapped range, since this would otherwise trigger swap space reservation of the entire file (or cause the system to return ENOMEM
, if system wide overcommitting hasn't been enabled and you exceed the free memory of the system).
The issue to watch out for is that if you do write to a large portion of this memory, the lazy swap space reservation may cause your application to consume all the free RAM and swap on the system, eventually triggering the OOM killer (Linux) or causing your app to receive a SIGSEGV
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With