I'd like to use a <code>std::pmr::unordered_map</code> with a <code>std::pmr::monotonic_buffer_resource</code>. The two fit well together, because the set's nodes are stable, so I don't create a lot of holes in the buffer resource by reallocation: <pre class="prettyprint"><code> std::pmr::monotonic_buffer_resource res; std::pmr::unordered_set<T> set(&res); </code></pre> That is, except for the bucket list, which needs to be reallocated when the set rehashes as it exceeds the <code>max_load_factor()</code>. Assuming I can't <code>reserve()</code> my way out of this, and I actually care about the holes in the buffer resource left by old bucket lists since grown, what are my options? If I know that <code>unordered_set</code> is implemented as <code>std::vector<std::forward_list></code>, as in (some versions of) MSVC, then I should be able to use a <code>scoped_allocator</code> to give different allocators for the <code>vector</code> and the <code>forward_list</code>. But a) I can't rely on <code>unordered_set</code> being a <code>vector<forward_list></code> in portable code and b) <code>scoped_allocator</code> is an <code>Allocator</code> whereas <code>monotonic_buffer_resource</code> is a <code>memory_resource</code>, an impedance mismatch that will make for very complicated initialization. Or I could write a <code>switch_memory_resource</code> that delegates to other <code>memory_resource</code>s based on the size of the request. I could then use a <code>monotonic_buffer_resource</code> for requests that match the size of the nodes (which, however, I cannot, portably, know, either) and <code>default_memory_resource()</code> for everything else. I could probably make an educated guess that the nodes are at most <code>sizeof(struct {void* next; size_t hash; T value;})</code>, add some error margin by multiplying that by two and use that as the cut-off between the two <code>memory_resource</code>s, but I wonder whether there's a cleaner way?

The small number of concrete resource types that I proposed a number of years ago and that were adopted into C++17 was a minimalist set of useful allocators. As evidenced by your question, they do not provide optimal behavior for every circumstance. There are not many tuning dials and I have some regrets about missing functionality, but they are still useful for most cases. For your specific situation, you say "Assuming I can't <code>reserve()</code> my way out of this, and I actually care about the holes in the buffer resource left by old bucket lists since grown." I'm not sure any general allocator can help you. The geometric growth of the bucket list will leave holes in any allocation strategy. The question is whether those holes can be re-used and/or minimized. As you point out, only a very-carefully customized allocator for the very specific situation will minimize these holes. But maybe your assumptions are too strong. Consider a <code>std::pmr::vector<int></code>. This is the worst-case scenario for a <code>monotonic_buffer_resource</code> because every reallocation results in leaked memory. And yet, even this case has a worst-case memory waste of only 50%; i.e., it will never use more than twice as much memory as it would with a resource that perfectly reuses memory blocks. Granted, 50% can be pretty bad, but in your scenario, we are talking much, much less. For a reasonably large set, the bucket list is small compared to the buckets and the data itself, and you can use <code>reserve</code> to minimize reallocation. So my first piece of advice is to go ahead and use the <code>monotonic_buffer_resource</code> without alteration, and measure to see if you have unacceptable memory use. A second experiment would be to use an <code>unsynchronized_pool_resource</code> backed by an (upstream) <code>monotonic_buffer_resource</code>. If you decide you want to create a custom resource for this purpose, which might be fruitful and might even be fun, your approach of choosing some lower threshold for passing to the monotonic allocator would probably work and would not actually be a lot of effort. You could also consider making it adaptive: Keep a list of the last, say, 4, allocation sizes. If any size gets more than two hits, then assume it is your node size and allocate those nodes from the monotonic resource while other requests get passed directly to the upstream resource. Be careful, however, that you use such a custom allocator only when you know what's going on. If you have a <code>std::pmr::unordered_set<std::pmr::string></code>, then both approaches may result in many of the strings getting allocated from the upstream resource and thus losing the benefit of the monotonic buffer. As you can see, being too stingy with memory for your bucket list can backfire in a generic context. You're likely to discover that the unmodified <code>monotonic_buffer_resource</code> was a better bet. Good luck, and please report your findings back here. Also, if you have an idea for a general-purpose resource that could address your problem (or any other common allocation problem), I'd love to hear about it. There's certainly room in the standard for a few more useful resource types.

Can an unordered_set use a different allocator for the nodes and the bucket list?

Tags:

c++

c++17

allocator

unordered-set

c++pmr

I'd like to use a std::pmr::unordered_map with a std::pmr::monotonic_buffer_resource. The two fit well together, because the set's nodes are stable, so I don't create a lot of holes in the buffer resource by reallocation:

 std::pmr::monotonic_buffer_resource res;
 std::pmr::unordered_set<T> set(&res);

That is, except for the bucket list, which needs to be reallocated when the set rehashes as it exceeds the max_load_factor(). Assuming I can't reserve() my way out of this, and I actually care about the holes in the buffer resource left by old bucket lists since grown, what are my options?

If I know that unordered_set is implemented as std::vector<std::forward_list>, as in (some versions of) MSVC, then I should be able to use a scoped_allocator to give different allocators for the vector and the forward_list. But a) I can't rely on unordered_set being a vector<forward_list> in portable code and b) scoped_allocator is an Allocator whereas monotonic_buffer_resource is a memory_resource, an impedance mismatch that will make for very complicated initialization.

Or I could write a switch_memory_resource that delegates to other memory_resources based on the size of the request. I could then use a monotonic_buffer_resource for requests that match the size of the nodes (which, however, I cannot, portably, know, either) and default_memory_resource() for everything else. I could probably make an educated guess that the nodes are at most sizeof(struct {void* next; size_t hash; T value;}), add some error margin by multiplying that by two and use that as the cut-off between the two memory_resources, but I wonder whether there's a cleaner way?

314

asked Nov 17 '20 14:11

Marc Mutz - mmutz

1 Answers

The small number of concrete resource types that I proposed a number of years ago and that were adopted into C++17 was a minimalist set of useful allocators. As evidenced by your question, they do not provide optimal behavior for every circumstance. There are not many tuning dials and I have some regrets about missing functionality, but they are still useful for most cases.

For your specific situation, you say "Assuming I can't reserve() my way out of this, and I actually care about the holes in the buffer resource left by old bucket lists since grown." I'm not sure any general allocator can help you. The geometric growth of the bucket list will leave holes in any allocation strategy. The question is whether those holes can be re-used and/or minimized. As you point out, only a very-carefully customized allocator for the very specific situation will minimize these holes. But maybe your assumptions are too strong.

Consider a std::pmr::vector<int>. This is the worst-case scenario for a monotonic_buffer_resource because every reallocation results in leaked memory. And yet, even this case has a worst-case memory waste of only 50%; i.e., it will never use more than twice as much memory as it would with a resource that perfectly reuses memory blocks. Granted, 50% can be pretty bad, but in your scenario, we are talking much, much less. For a reasonably large set, the bucket list is small compared to the buckets and the data itself, and you can use reserve to minimize reallocation. So my first piece of advice is to go ahead and use the monotonic_buffer_resource without alteration, and measure to see if you have unacceptable memory use. A second experiment would be to use an unsynchronized_pool_resource backed by an (upstream) monotonic_buffer_resource.

If you decide you want to create a custom resource for this purpose, which might be fruitful and might even be fun, your approach of choosing some lower threshold for passing to the monotonic allocator would probably work and would not actually be a lot of effort. You could also consider making it adaptive: Keep a list of the last, say, 4, allocation sizes. If any size gets more than two hits, then assume it is your node size and allocate those nodes from the monotonic resource while other requests get passed directly to the upstream resource. Be careful, however, that you use such a custom allocator only when you know what's going on. If you have a std::pmr::unordered_set<std::pmr::string>, then both approaches may result in many of the strings getting allocated from the upstream resource and thus losing the benefit of the monotonic buffer. As you can see, being too stingy with memory for your bucket list can backfire in a generic context. You're likely to discover that the unmodified monotonic_buffer_resource was a better bet.

Good luck, and please report your findings back here. Also, if you have an idea for a general-purpose resource that could address your problem (or any other common allocation problem), I'd love to hear about it. There's certainly room in the standard for a few more useful resource types.

168

answered Oct 20 '22 04:10

Pablo Halpern

Related questions
                            
                                Parameter of returned generic lambda allegedly shadows parameter of free function
                            
                                Change default C++ standard in g++
                            
                                Blocking on many locks/futures/etc. until any is ready
                            
                                How to check if every type in a parameter pack is unique? [duplicate]
                            
                                Is it possible that a store with memory_order_relaxed never reaches other threads?
                            
                                Extending a type in C++
                            
                                How to force pow(float, int) to return float
                            
                                Asynchronous model in grpc c++
                            
                                C++ Overloading different with signed and unsigned int
                            
                                What should tuple_map return?
                            
                                Ternary allowed to call an explicit copy constructor implicitly?
                            
                                Does passing a `unique_ptr` by value have a performance penalty compared to a plain pointer?
                            
                                What's the fastest way to perform an arbitrary 128/256/512 bit permutation using SIMD instructions?
                            
                                Faster way to convert a vector of vectors to a single contiguous vector with opposite storage order
                            
                                Is there a flaw in how clang implements char8_t or does some dark corner of the standard prohibit optimization?
                            
                                Clang not generating debug info on -g flag
                            
                                Convert uint64_t to byte array portably and optimally in Clang
                            
                                Type of member variables in a const member function
                            
                                Is sizeof(variableName or expression) the same as sizeof(decltype(variableName or expression))?
                            
                                Converting a 32bit directx9 app to be large address aware

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With