Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can an unordered_set use a different allocator for the nodes and the bucket list?

I'd like to use a std::pmr::unordered_map with a std::pmr::monotonic_buffer_resource. The two fit well together, because the set's nodes are stable, so I don't create a lot of holes in the buffer resource by reallocation:

 std::pmr::monotonic_buffer_resource res;
 std::pmr::unordered_set<T> set(&res);

That is, except for the bucket list, which needs to be reallocated when the set rehashes as it exceeds the max_load_factor(). Assuming I can't reserve() my way out of this, and I actually care about the holes in the buffer resource left by old bucket lists since grown, what are my options?

If I know that unordered_set is implemented as std::vector<std::forward_list>, as in (some versions of) MSVC, then I should be able to use a scoped_allocator to give different allocators for the vector and the forward_list. But a) I can't rely on unordered_set being a vector<forward_list> in portable code and b) scoped_allocator is an Allocator whereas monotonic_buffer_resource is a memory_resource, an impedance mismatch that will make for very complicated initialization.

Or I could write a switch_memory_resource that delegates to other memory_resources based on the size of the request. I could then use a monotonic_buffer_resource for requests that match the size of the nodes (which, however, I cannot, portably, know, either) and default_memory_resource() for everything else. I could probably make an educated guess that the nodes are at most sizeof(struct {void* next; size_t hash; T value;}), add some error margin by multiplying that by two and use that as the cut-off between the two memory_resources, but I wonder whether there's a cleaner way?

like image 314
Marc Mutz - mmutz Avatar asked Nov 17 '20 14:11

Marc Mutz - mmutz


People also ask

Can unordered_set have duplicate values?

Keys are immutable, therefore, the elements in an unordered_set cannot be modified once in the container - However, they can be inserted and removed. Unordered sets do not allow duplicates and are initialized using comma-delimited values enclosed in curly braces.

Is unordered set unique?

Unordered sets are containers that store unique elements in no particular order, and which allow for fast retrieval of individual elements based on their value. In an unordered_set, the value of an element is at the same time its key, that identifies it uniquely.

Is unordered_set faster than set?

Comparing the timings for std::set , std::unordered_set with the original hash, and std::unordered_set with the extra hash, we see that for sufficiently large number of elements, std::unordered_set actually becomes faster than std::set : Another solution would be not to use a power of two for the number of buckets.

What is the difference between Unordered_map and unordered_set?

unordered_set only contains keys, and no values. There is no mapping from a key to a value, so no need for an operator[] . unordered_map maps a key to a value.

What is unordered set in Python?

Unordered sets are containers that store unique elements in no particular order, and which allow for fast retrieval of individual elements based on their value. In an unordered_set, the value of an element is at the same time its key, that identifies it uniquely.

What is the difference between set and unordered_set in STL?

If you want to look at implementation details of set and unordered_set in c++ STL, see Set Vs Map. Set allows to traverse elements in sorted order whereas Unordered_set doesn’t allow to traverse elements in sorted order. Set can be modified to find predecessor or successor whereas Unordered_set doesn’t allow to find predecessor/Successor.

How to create an unordered (bulleted) list?

The HTML <ul> tag defines an unordered (bulleted) list. An unordered list starts with the <ul> tag. Each list item starts with the <li> tag. The CSS list-style-type property is used to define the style of the list item marker.

What is the difference between bucket size and erase in unordered_set?

bucket_size () – Returns the total number of elements present in a specific bucket in an unordered_set container. erase () – Remove either a single element or a range of elements ranging from start (inclusive) to end (exclusive). size () – Return the number of elements in the unordered_set container.


1 Answers

The small number of concrete resource types that I proposed a number of years ago and that were adopted into C++17 was a minimalist set of useful allocators. As evidenced by your question, they do not provide optimal behavior for every circumstance. There are not many tuning dials and I have some regrets about missing functionality, but they are still useful for most cases.

For your specific situation, you say "Assuming I can't reserve() my way out of this, and I actually care about the holes in the buffer resource left by old bucket lists since grown." I'm not sure any general allocator can help you. The geometric growth of the bucket list will leave holes in any allocation strategy. The question is whether those holes can be re-used and/or minimized. As you point out, only a very-carefully customized allocator for the very specific situation will minimize these holes. But maybe your assumptions are too strong.

Consider a std::pmr::vector<int>. This is the worst-case scenario for a monotonic_buffer_resource because every reallocation results in leaked memory. And yet, even this case has a worst-case memory waste of only 50%; i.e., it will never use more than twice as much memory as it would with a resource that perfectly reuses memory blocks. Granted, 50% can be pretty bad, but in your scenario, we are talking much, much less. For a reasonably large set, the bucket list is small compared to the buckets and the data itself, and you can use reserve to minimize reallocation. So my first piece of advice is to go ahead and use the monotonic_buffer_resource without alteration, and measure to see if you have unacceptable memory use. A second experiment would be to use an unsynchronized_pool_resource backed by an (upstream) monotonic_buffer_resource.

If you decide you want to create a custom resource for this purpose, which might be fruitful and might even be fun, your approach of choosing some lower threshold for passing to the monotonic allocator would probably work and would not actually be a lot of effort. You could also consider making it adaptive: Keep a list of the last, say, 4, allocation sizes. If any size gets more than two hits, then assume it is your node size and allocate those nodes from the monotonic resource while other requests get passed directly to the upstream resource. Be careful, however, that you use such a custom allocator only when you know what's going on. If you have a std::pmr::unordered_set<std::pmr::string>, then both approaches may result in many of the strings getting allocated from the upstream resource and thus losing the benefit of the monotonic buffer. As you can see, being too stingy with memory for your bucket list can backfire in a generic context. You're likely to discover that the unmodified monotonic_buffer_resource was a better bet.

Good luck, and please report your findings back here. Also, if you have an idea for a general-purpose resource that could address your problem (or any other common allocation problem), I'd love to hear about it. There's certainly room in the standard for a few more useful resource types.

like image 168
Pablo Halpern Avatar answered Oct 20 '22 04:10

Pablo Halpern