Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In C++, what is the fastest way to load a large binary (1GB-4GB) file into memory?

Tags:

c++

linux

posix

On Linux 64-bit (such as Amazon EC2 instance), I need to load a couple large binary files into memory. What is the fastest way?

  • ifstream
  • fread
  • POSIX open
  • POSIX mmap (doesn't actually load the whole file into memory, which hurts performance)
  • something else?

Also, the node may or may not launch this executable a second time, so it helps if the file loaded even faster on subsequent attempts. Some sort of pre-loading step may even work.

like image 227
Victor Lyuboslavsky Avatar asked Feb 11 '13 22:02

Victor Lyuboslavsky


1 Answers

The time is going to be dominated by disk I/O, so which API you use is not as important as thinking about how a disk works. If you access a disk (rotating media) randomly it will cost 3 to 9 milliseconds to seek... once the disk is streaming it can sustain about 128 MB/sec, that is how fast bits will be coming off the disk head. The SATA link or PCIe bus have much higher bandwidth than that (600 to 2000 MB/sec). Linux has a page cache in memory where it keeps a copy of pages on the disk, so provided your machine has adequate amounts of RAM subsequent attempts will be fast, even if you then access the data randomly. So the advice is read large blocks at a time. If you really want to speed up the initial load then you could use mmap to map the entire file (1GB-4GB) and have a helper thread that reads the 1st byte of each page in order.

You can read more about disk drive performance characteristics here.

You can read more about the page cache here.

like image 64
amdn Avatar answered Oct 21 '22 00:10

amdn