On Linux 64-bit (such as Amazon EC2 instance), I need to load a couple large binary files into memory. What is the fastest way?
Also, the node may or may not launch this executable a second time, so it helps if the file loaded even faster on subsequent attempts. Some sort of pre-loading step may even work.
The time is going to be dominated by disk I/O, so which API you use is not as important as thinking about how a disk works. If you access a disk (rotating media) randomly it will cost 3 to 9 milliseconds to seek... once the disk is streaming it can sustain about 128 MB/sec, that is how fast bits will be coming off the disk head. The SATA link or PCIe bus have much higher bandwidth than that (600 to 2000 MB/sec). Linux has a page cache in memory where it keeps a copy of pages on the disk, so provided your machine has adequate amounts of RAM subsequent attempts will be fast, even if you then access the data randomly. So the advice is read large blocks at a time. If you really want to speed up the initial load then you could use mmap to map the entire file (1GB-4GB) and have a helper thread that reads the 1st byte of each page in order.
You can read more about disk drive performance characteristics here.
You can read more about the page cache here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With