I am trying to "mmap" a binary file (~ 8Gb) using the following code (test.c). <pre class="prettyprint"><code>#include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <sys/mman.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #define handle_error(msg) \ do { perror(msg); exit(EXIT_FAILURE); } while (0) int main(int argc, char *argv[]) { const char *memblock; int fd; struct stat sb; fd = open(argv[1], O_RDONLY); fstat(fd, &sb); printf("Size: %lu\n", (uint64_t)sb.st_size); memblock = mmap(NULL, sb.st_size, PROT_WRITE, MAP_PRIVATE, fd, 0); if (memblock == MAP_FAILED) handle_error("mmap"); for(uint64_t i = 0; i < 10; i++) { printf("[%lu]=%X ", i, memblock[i]); } printf("\n"); return 0; } </code></pre> test.c is compiled using <code>gcc -std=c99 test.c -o test</code> and <code>file</code> of test returns: <code>test: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped</code> Although this works fine for small files, I get a segmentation fault when I try to load a big one. The program actually returns: <pre class="prettyprint"><code>Size: 8274324021 mmap: Cannot allocate memory </code></pre> I managed to map the whole file using boost::iostreams::mapped_file but I want to do it using C and system calls. What is wrong with my code?

<code>MAP_PRIVATE</code> mappings require a memory reservation, as writing to these pages may result in copy-on-write allocations. This means that you can't map something too much larger than your physical ram + swap. Try using a <code>MAP_SHARED</code> mapping instead. This means that writes to the mapping will be reflected on disk - as such, the kernel knows it can always free up memory by doing writeback, so it won't limit you. I also note that you're mapping with <code>PROT_WRITE</code>, but you then go on and read from the memory mapping. You also opened the file with <code>O_RDONLY</code> - this itself may be another problem for you; you must specify <code>O_RDWR</code> if you want to use <code>PROT_WRITE</code> with <code>MAP_SHARED</code>. As for <code>PROT_WRITE</code> only, this happens to work on x86, because x86 doesn't support write-only mappings, but may cause segfaults on other platforms. Request <code>PROT_READ|PROT_WRITE</code> - or, if you only need to read, <code>PROT_READ</code>. On my system (VPS with 676MB RAM, 256MB swap), I reproduced your problem; changing to <code>MAP_SHARED</code> results in an <code>EPERM</code> error (since I'm not allowed to write to the backing file opened with <code>O_RDONLY</code>). Changing to <code>PROT_READ</code> and <code>MAP_SHARED</code> allows the mapping to succeed. If you need to modify bytes in the file, one option would be to make private just the ranges of the file you're going to write to. That is, <code>munmap</code> and remap with <code>MAP_PRIVATE</code> the areas you intend to write to. Of course, if you intend to write to the entire file then you need 8GB of memory to do so. Alternately, you can write <code>1</code> to <code>/proc/sys/vm/overcommit_memory</code>. This will allow the mapping request to succeed; however, keep in mind that if you actually try to use the full 8GB of COW memory, your program (or some other program!) will be killed by the OOM killer.

Mmap() an entire large file

Tags:

c

mmap

I am trying to "mmap" a binary file (~ 8Gb) using the following code (test.c).

#include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <sys/mman.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h>  #define handle_error(msg) \   do { perror(msg); exit(EXIT_FAILURE); } while (0)  int main(int argc, char *argv[]) {    const char *memblock;    int fd;    struct stat sb;     fd = open(argv[1], O_RDONLY);    fstat(fd, &sb);    printf("Size: %lu\n", (uint64_t)sb.st_size);     memblock = mmap(NULL, sb.st_size, PROT_WRITE, MAP_PRIVATE, fd, 0);    if (memblock == MAP_FAILED) handle_error("mmap");     for(uint64_t i = 0; i < 10; i++)    {      printf("[%lu]=%X ", i, memblock[i]);    }    printf("\n");    return 0; }

test.c is compiled using gcc -std=c99 test.c -o test and file of test returns: test: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped

Although this works fine for small files, I get a segmentation fault when I try to load a big one. The program actually returns:

Size: 8274324021  mmap: Cannot allocate memory

I managed to map the whole file using boost::iostreams::mapped_file but I want to do it using C and system calls. What is wrong with my code?

664

asked Aug 28 '11 16:08

Emer

2 Answers

MAP_PRIVATE mappings require a memory reservation, as writing to these pages may result in copy-on-write allocations. This means that you can't map something too much larger than your physical ram + swap. Try using a MAP_SHARED mapping instead. This means that writes to the mapping will be reflected on disk - as such, the kernel knows it can always free up memory by doing writeback, so it won't limit you.

I also note that you're mapping with PROT_WRITE, but you then go on and read from the memory mapping. You also opened the file with O_RDONLY - this itself may be another problem for you; you must specify O_RDWR if you want to use PROT_WRITE with MAP_SHARED.

As for PROT_WRITE only, this happens to work on x86, because x86 doesn't support write-only mappings, but may cause segfaults on other platforms. Request PROT_READ|PROT_WRITE - or, if you only need to read, PROT_READ.

On my system (VPS with 676MB RAM, 256MB swap), I reproduced your problem; changing to MAP_SHARED results in an EPERM error (since I'm not allowed to write to the backing file opened with O_RDONLY). Changing to PROT_READ and MAP_SHARED allows the mapping to succeed.

If you need to modify bytes in the file, one option would be to make private just the ranges of the file you're going to write to. That is, munmap and remap with MAP_PRIVATE the areas you intend to write to. Of course, if you intend to write to the entire file then you need 8GB of memory to do so.

Alternately, you can write 1 to /proc/sys/vm/overcommit_memory. This will allow the mapping request to succeed; however, keep in mind that if you actually try to use the full 8GB of COW memory, your program (or some other program!) will be killed by the OOM killer.

answered Sep 16 '22 15:09

bdonlan

Linux (and apparently a few other UNIX systems) have the MAP_NORESERVE flag for mmap(2), which can be used to explicitly enable swap space overcommitting. This can be useful when you wish to map a file larger than the amount of free memory available on your system.

This is particularly handy when used with MAP_PRIVATE and only intend to write to a small portion of the memory mapped range, since this would otherwise trigger swap space reservation of the entire file (or cause the system to return ENOMEM, if system wide overcommitting hasn't been enabled and you exceed the free memory of the system).

The issue to watch out for is that if you do write to a large portion of this memory, the lazy swap space reservation may cause your application to consume all the free RAM and swap on the system, eventually triggering the OOM killer (Linux) or causing your app to receive a SIGSEGV.

answered Sep 16 '22 15:09

dcoles

Related questions
                            
                                How are exceptions implemented under the hood? [closed]
                            
                                How to detect if the current process is being run by GDB
                            
                                Why can a string be assigned to a char* pointer, but not to a char[] array?
                            
                                Best way to check if a character array is empty
                            
                                Setting std=c99 flag in GCC
                            
                                How do I use setsockopt(SO_REUSEADDR)?
                            
                                Copying one structure to another
                            
                                Order of execution for an if with multiple conditionals
                            
                                Aligning to cache line and knowing the cache line size
                            
                                File Operations in Android NDK
                            
                                How to get the number of CPUs in Linux using C?
                            
                                What is the simplest standard conform way to produce a Segfault in C?
                            
                                clock_gettime alternative in Mac OS X
                            
                                How do I check if an integer is even or odd using bitwise operators
                            
                                Is function call an effective memory barrier for modern platforms?
                            
                                What kind of optimization does const offer in C/C++?
                            
                                Why is strncpy insecure?
                            
                                Do I need to keep a file open after calling mmap on it?
                            
                                Why is there no strtoi in stdlib.h?
                            
                                What do 1.#INF00, -1.#IND00 and -1.#IND mean?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With