Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::multiset<int> vs. std::map<int, std::size_t> for keeping multiple repeatable integer values

I want to create a record that would hold the information about

  • a) what kind of elements are present and
  • b) the number of elements of each kind present

in a node of a tree. I would explicitly store this information only for the leaf nodes, while the information for the parent node can be obtaining through combining the information of all of it's children (e.g. child 1 has 3 objects of A, 1 object of B, child 2 has 1 object of A, 2 objects of C -- parent has 4 objects of A, 1 object of B and 2 of C).

I will be careful when requesting this information from the parent nodes not to first request, use and discard information for a child node and then for its parent node, but the upward construction will be a common operation. Other two common operations are directly derived from what I store: is the object of kind X present? and how many objects of kind X is present? and also how many kinds of objects are present?

Object kinds are represented as integers, and the object numbers are always integer values. What is the better choice (and arguments for the selected choice):

  • use std::multiset<int>, and operate with std::multiset::count() and std::multiset::find() operations (easier union but duplication of elements, total distinct element count hard to obtain)
  • use std::map<int, std::size_t> with the kind as a key and number of objects as a value (no duplicate elements, std::map::find() function present, size gives the correct number of object kinds stored, but accessing a non-existent element increases the size unintentionally)

Thank you for your suggestions!

like image 679
penelope Avatar asked May 30 '12 09:05

penelope


People also ask

Does C++ map preserve insertion order?

C++ hash map and hash set which preserves the order of insertion. The ordered-map library provides a hash map and a hash set which preserve the order of insertion in a way similar to Python's OrderedDict. When iterating over the map, the values will be returned in the same order as they were inserted.

What is the point of a multiset?

Multiset in C++: Multiset is a type of associative containers that store elements following a specific order, and where multiple elements can have same values.

Is multiset always sorted?

Note that this is the largest item in the multiset, because the multiset is always stored in sorted order.

What is multiset and multimap in c++?

Multimap in C++The multimap container in C++ is similar to the map container with an addition that a multimap can have multiple key-value pairs with the same key. Rather than each element is unique, the key-value and mapped value pair have to be unique in this case.


2 Answers

What about a sorted std::vector<int>? The operations you need can be satisfied as follows:

  • Is the object of kind X present? std::binary_search
  • How many objects of kind X are there? std::equal_range, subtract .first from .second
  • How many kinds of objects are present?
    • std::unique_copy followed by size() of the copy, or...
    • use a separate counter, call std::binary_search before inserting into the vector

Advantages to this approach are cache locality (all your data is contiguous) and lower memory footprint compared to a tree-like structure. Without knowing more about your data, I can't say for sure whether it would be faster or slower. You'll have to profile it to find out, but I have a hunch this will perform better than you might expect.

The biggest tradeoff here is expressiveness. The std::map approach probably does a better job of logically conveying what you're doing, i.e., a relationship between object IDs and a count.

like image 40
Michael Kristofik Avatar answered Oct 19 '22 09:10

Michael Kristofik


To store a total of n items with k distinct values per your comparison predicate, an std::multiset allocates n binary search tree nodes(*). An std::map allocates only k (slightly larger) nodes.

You'd use std::multiset when two items can be considered equal by your comparison predicate, but must still be explicitly stored, because they differ in some aspect that the comparison predicate does not check. Also, iterating over a multiset generates each of the n items, whereas a map would generate each of the k distinct items with the count for each.

In the case where the items are just integers, go with std::map. Your "how many distinct items" query would then just be a call to size, which runs in constant time.

Your claim that "accessing a non-existent element increases the size unintentionally" is only true if you use operator[] to access nodes. find does not exhibit this behavior.

(*) The C++ standard does not guarantee that these containers are implemented as (balanced) BSTs, but in all implementations that I've seen, they are.

like image 136
Fred Foo Avatar answered Oct 19 '22 09:10

Fred Foo