Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mmap() for large file I/O?

Tags:

c++

linux

file-io

I'm creating a utility in C++ to be run on Linux which can convert videos to a proprietary format. The video frames are very large (up to 16 megapixels), and we need to be able to seek directly to exact frame numbers, so our file format uses libz to compress each frame individually, and append the compressed data onto a file. Once all frames are finished being written, a journal which includes meta data for each frame (including their file offsets and sizes) is written to the end of the file.

I'm currently using ifstream and ofstream to do the file i/o, but I am looking to optimize as much as possible. I've heard that mmap() can increase performance in a lot of cases, and I'm wondering if mine is one of them. Our files will be in the tens to hundreds of gigabytes, and although writing will always be done sequentially, random access reads should be done in constant time. Any thoughts as to whether I should investigate this further, and if so does anyone have any tips for things to look out for?

Thanks!

like image 963
rcv Avatar asked Apr 20 '10 23:04

rcv


2 Answers

On a 32-bit machine your process is limited to 2-3 GB of user address space. This means that (allowing for other memory use) you won't be able to map more than ~1 GB of your file at a time. This does NOT mean that you cannot use mmap() for very large files - just that you need to map only part of the file at a time.

That being said, mmap() can still be a large win for large files. The most significant advantage is that you don't waste memory for keeping the data TWICE - one copy in the system cache, one copy in a private buffer of your application - and CPU time to make those copies. It can be an even more major speedup for random access - but the "random" part must be limited in range to your current mapping(s).

like image 175
slacker Avatar answered Oct 27 '22 00:10

slacker


If your files are 10 GB or more, then don't even think about trying to use mmap() on a 32-bit architecture. Go directly to a 64-bit OS, which should be able to handle it just fine.

Note that files that are mapped into memory space don't actually consume the same amount of RAM (as the file size), so you won't need to install hundreds of gigabytes of RAM in your machine.

like image 39
Greg Hewgill Avatar answered Oct 26 '22 23:10

Greg Hewgill