Loading large amount of binary data into RAM

Question

My application needs to load from MegaBytes to dozens of GigaBytes of binary data (multiple files) into RAM. After some search, I decided to use std::vector<unsigned char> for this purpose, although I am not sure it's the best choice.

I would use one vector for each file. As application previously knows file size, it would call reserve() to allocate memory for it. Sometimes the application might need to fully read a file and in some others only part of it and vector's iterators are nice for that. It may need to unload a file from RAM and put other in place, std::vector::swap() and std::vector::shrink_to_fit() would be very useful. I don't want to have the hard work of dealing with low level memory allocation stuff (otherwise would go with C).

I've some questions:

Application must load the more files from a list it can into RAM. How would it know if there is enough memory space to load one more file? Should it call reserve() and look for errors? How? Reference only says reserve() throws an exception when requested size is greater than std::vector::max_size.
Is std::vector<unsigned char> applicable for getting such large amount of binary data into RAM? I'm worried about std::vector::max_size, since its reference says its value would depend on system or implementation limitations. I presume system limitation is free RAM, is it right? So, no problem. But what about implementations limitation? Are there anything regarding to implementations that could prevent me from doing what I want to? Case affirmative, please give me an alternative.
And what if I want to use entire RAM space, except N GigaBytes? Is the best way really to use sysinfo() and deduce based on free RAM if it is possible to load each file?

Obs.: This section of the application must be get the more performance (low processing time/CPU usage and RAM consumption) possible. I would appreciate your help.

eerorika · Accepted Answer

How would it know if there is enough memory space to load one more file?

You wouldn't know before hand. Wrap the loading process in try - catch. If memory runs out, then a std::bad_alloc will be thrown (assuming you use default allocators). Assume that memory is sufficient in the loading code, and deal with the lack of memory in the exception handler.

But what about implementations limitation? ... Are there anything regarding to implementations that could prevent me from doing what I want to?

You can check std::vector::max_size at run time to verify.

If the program is compiled with a 64 bit word size, then it is quite likely that the vector has sufficient max_size for a few hundred gigabytes.

This section of the application must be get the more performance

This conflicts with

I don't want to have the hard work of dealing with low level memory allocation stuff

But in case low level memory stuff is worth it for the performance, you could memory-map the file into memory.

I've read on some SO questions to avoid them on applications that need high performance and prefer dealing with return values, errno, etc

Unfortunately for you, non-throwing memory allocation is not an option if you use the standard containers. If you are allergic to exceptions, then you must use another implementation of a vector - or whatever container you decide to use. You don't need any container with mmap, though.

Won't handling exceptions break performance?

Luckily for you, run time cost of exceptions is insignificant compared to reading hundreds of gigabytes from disk.

May it be better to run sysinfo() and work on checking free RAM before loading a file?

sysinfo call may very well be slower than handling an exception (I haven't measured, that is just a conjecture) - and it won't tell you about process specific limits that may exist.

And also, it looks hard and costly to repetitively try load a file, catch exception and try load a smaller file (requires recursion?)

No recursion needed. You can use it if you prefer; it can be written with tail call, that can be optimized away.

About memory mapping: I took a look on it sometime ago and found boring to deal with. Would require to use C's open() and all that stuff and say bye to std::fstream.

Once you have mapped the memory, it is easier to use than std::fstream. You can skip the copying into vector part, and simply use the mapped memory as if it was an array that already exists in memory.

Looks like best way of partially reading a file using std::fstream is to derive std::streambuf

I don't see why you would need to derive anything. Just use std::basic_fstream::seekg() to skip to the part that you wish to read.

Loading large amount of binary data into RAM

Tags:

c++

memory-management

c++11

vector

Tiago.SR

1 Answers

eerorika

Recent Activity

Donate For Us

Loading large amount of binary data into RAM

Tags:

c++

memory-management

c++11

vector

Tiago.SR

1 Answers

eerorika

Related questions

Recent Activity

Donate For Us