I want to create a record that would hold the information about
in a node of a tree. I would explicitly store this information only for the leaf nodes, while the information for the parent node can be obtaining through combining the information of all of it's children (e.g. child 1 has 3 objects of A, 1 object of B, child 2 has 1 object of A, 2 objects of C -- parent has 4 objects of A, 1 object of B and 2 of C).
I will be careful when requesting this information from the parent nodes not to first request, use and discard information for a child node and then for its parent node, but the upward construction will be a common operation. Other two common operations are directly derived from what I store: is the object of kind X present? and how many objects of kind X is present? and also how many kinds of objects are present?
Object kinds are represented as integers, and the object numbers are always integer values. What is the better choice (and arguments for the selected choice):
std::multiset<int>
, and operate with std::multiset::count()
and std::multiset::find()
operations (easier union but duplication of elements, total distinct element count hard to obtain)std::map<int, std::size_t>
with the kind as a key and number of objects as a value (no duplicate elements, std::map::find()
function present, size gives the correct number of object kinds stored, but accessing a non-existent element increases the size unintentionally)Thank you for your suggestions!
C++ hash map and hash set which preserves the order of insertion. The ordered-map library provides a hash map and a hash set which preserve the order of insertion in a way similar to Python's OrderedDict. When iterating over the map, the values will be returned in the same order as they were inserted.
Multiset in C++: Multiset is a type of associative containers that store elements following a specific order, and where multiple elements can have same values.
Note that this is the largest item in the multiset, because the multiset is always stored in sorted order.
Multimap in C++The multimap container in C++ is similar to the map container with an addition that a multimap can have multiple key-value pairs with the same key. Rather than each element is unique, the key-value and mapped value pair have to be unique in this case.
What about a sorted std::vector<int>
? The operations you need can be satisfied as follows:
std::binary_search
std::equal_range
, subtract .first
from .second
std::unique_copy
followed by size()
of the copy, or...std::binary_search
before inserting into the vectorAdvantages to this approach are cache locality (all your data is contiguous) and lower memory footprint compared to a tree-like structure. Without knowing more about your data, I can't say for sure whether it would be faster or slower. You'll have to profile it to find out, but I have a hunch this will perform better than you might expect.
The biggest tradeoff here is expressiveness. The std::map
approach probably does a better job of logically conveying what you're doing, i.e., a relationship between object IDs and a count.
To store a total of n items with k distinct values per your comparison predicate, an std::multiset
allocates n binary search tree nodes(*). An std::map
allocates only k (slightly larger) nodes.
You'd use std::multiset
when two items can be considered equal by your comparison predicate, but must still be explicitly stored, because they differ in some aspect that the comparison predicate does not check. Also, iterating over a multiset
generates each of the n items, whereas a map
would generate each of the k distinct items with the count for each.
In the case where the items are just integers, go with std::map
. Your "how many distinct items" query would then just be a call to size
, which runs in constant time.
Your claim that "accessing a non-existent element increases the size unintentionally" is only true if you use operator[]
to access nodes. find
does not exhibit this behavior.
(*) The C++ standard does not guarantee that these containers are implemented as (balanced) BSTs, but in all implementations that I've seen, they are.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With