Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a downside to a significant overestimation in a reserve()?

Let's suppose we have a method that creates and uses possibly very big vector<foo>s. The maximum number of elements is known to be maxElems.

Standard practice as of C++11 is to my best knowledge:

vector<foo> fooVec;
fooVec.reserve(maxElems);
//... fill fooVec using emplace_back() / push_back()

But what happens if we have a scenario where the number of elements is going to be significantly less in the majority of calls to our method?

Is there any disadvantage to the conservative reserve call other than the excess allocated memory (which supposably can be be freed with shrink_to_fit() if necessary)?

like image 950
mrclng Avatar asked Jul 26 '17 15:07

mrclng


1 Answers

Summary

There is likely to be some downside to using a too-large reserve, but how much is depends both on the size and context of your reserve() as well as your specific allocator, operating system and their configuration.

As you are probably aware, on platforms like Windows and Linux, large allocations are generally not allocating any physical memory or page table entries until it is first accessed, so you might imagine large, unused allocations to be "free". Sometimes this is called "reserving" memory without "committing" it, and I'll use those terms here.

Here are some reasons this might not be as free as you'd imagine:

Page Granularity

The lazy commit described above only happens at a page granularity. If you are using (typical) 4096 byte pages, it means that if you usually reserve 4,000 bytes for a vector that will usually contains elements taking up 100 bytes, the lazy commit buys you nothing! At least the whole page of 4096 bytes has to be committed and you don't save physical memory. So it isn't just the ratio between the expected and reserved size that matters, but the absolute size of the reserved size that determines how much waste you'll see.

Keep in mind that many systems are now using "huge pages" transparently, so in some cases the granularity will be on the order of 2 MB or more. In that case you need allocations on the order of 10s or 100s of MB to really take advantage of the lazy allocation strategy.

Worse Allocation Performance

Memory allocators for C++ generally try to allocate large chunks of memory (e.g., via sbrk or mmap on Unix-like platforms) and then efficiently carve that up into the small chunks the application is requesting. Getting these large chunks of memory via say a system call like mmap may be several orders of magnitude slower than the fast path allocation within the allocator which is often only a dozen instructions or so. When you ask for large chunks that you mostly won't use, you defeat that optimization and you'll often be going down the slow path.

As a concrete example, let's say your allocator asks mmap for chunks of 128 KB which it carves up to satisfy allocations. You are allocating about 2K of stuff in a typical vector, but reserve 64K. You'll now pay a mmap call for every other reserve call, but if you just asked for the 2K you ultimately needed, you'd have about 32 times fewer mmap calls.

Dependence on Overcommit Handling

When you ask for a lot of memory and don't use it, you can get into the situation where you've asked for more memory than your system supports (e.g., more than your RAM + swap). Whether this is even allowed depends on your OS and how it is configured, and no matter what you are up for some interesting behavior if you subsequently commit more memory simply by writing it. I means that arbitrary processes may be killed, or you might get unexpected errors on any memory write. What works on one system may fail on another due to different overcommit tunables.

Finally, it makes managing your process a bit harder since the "VM size" metric as reported by monitoring tools won't have much relationship to what your process may ultimately commit.

Worse Locality

Allocating more memory than you need makes it likely that your working set will be more sparsely spread out in the virtual address space. The overall effect is a reduction in locality of reference. For very small allocations (e.g., a few dozen bytes) this may reduce the within-same-cache-line locality, but for larger sizes the main effect is likely to be to spread your data onto more physical pages, increasing TLB pressure. The exact thresholds will depend a lot on details like whether hugepages are enabled.

like image 82
BeeOnRope Avatar answered Oct 20 '22 10:10

BeeOnRope