I want to create a record that would hold the information about <ul> <li>a) what kind of elements are present and </li> <li>b) the number of elements of each kind present</li> </ul> in a node of a tree. I would explicitly store this information only for the leaf nodes, while the information for the parent node can be obtaining through combining the information of all of it's children (e.g. child 1 has 3 objects of A, 1 object of B, child 2 has 1 object of A, 2 objects of C -- parent has 4 objects of A, 1 object of B and 2 of C). I will be careful when requesting this information from the parent nodes not to first request, use and discard information for a child node and then for its parent node, but the upward construction will be a common operation. Other two common operations are directly derived from what I store: is the object of kind X present? and how many objects of kind X is present? and also how many kinds of objects are present? Object kinds are represented as integers, and the object numbers are always integer values. What is the better choice (and arguments for the selected choice): <ul> <li>use <code>std::multiset<int></code>, and operate with <code>std::multiset::count()</code> and <code>std::multiset::find()</code> operations (easier union but duplication of elements, total distinct element count hard to obtain)</li> <li>use <code>std::map<int, std::size_t></code> with the kind as a key and number of objects as a value (no duplicate elements, <code>std::map::find()</code> function present, size gives the correct number of object kinds stored, but accessing a non-existent element increases the size unintentionally)</li> </ul> Thank you for your suggestions!

What about a sorted <code>std::vector<int></code>? The operations you need can be satisfied as follows: <ul> <li> Is the object of kind X present? <code>std::binary_search</code> </li> <li> How many objects of kind X are there? <code>std::equal_range</code>, subtract <code>.first</code> from <code>.second</code> </li> <li> How many kinds of objects are present? <ul> <li> <code>std::unique_copy</code> followed by <code>size()</code> of the copy, or...</li> <li>use a separate counter, call <code>std::binary_search</code> before inserting into the vector</li> </ul> </li> </ul> Advantages to this approach are cache locality (all your data is contiguous) and lower memory footprint compared to a tree-like structure. Without knowing more about your data, I can't say for sure whether it would be faster or slower. You'll have to profile it to find out, but I have a hunch this will perform better than you might expect. The biggest tradeoff here is expressiveness. The <code>std::map</code> approach probably does a better job of logically conveying what you're doing, i.e., a relationship between object IDs and a count.

To store a total of n items with k distinct values per your comparison predicate, an <code>std::multiset</code> allocates n binary search tree nodes(*). An <code>std::map</code> allocates only k (slightly larger) nodes. You'd use <code>std::multiset</code> when two items can be considered equal by your comparison predicate, but must still be explicitly stored, because they differ in some aspect that the comparison predicate does not check. Also, iterating over a <code>multiset</code> generates each of the n items, whereas a <code>map</code> would generate each of the k distinct items with the count for each. In the case where the items are just integers, go with <code>std::map</code>. Your "how many distinct items" query would then just be a call to <code>size</code>, which runs in constant time. Your claim that "accessing a non-existent element increases the size unintentionally" is only true if you use <code>operator[]</code> to access nodes. <code>find</code> does not exhibit this behavior. (*) The C++ standard does not guarantee that these containers are implemented as (balanced) BSTs, but in all implementations that I've seen, they are.

std::multiset<int> vs. std::map<int, std::size_t> for keeping multiple repeatable integer values

Tags:

c++

stl

containers

I want to create a record that would hold the information about

a) what kind of elements are present and
b) the number of elements of each kind present

in a node of a tree. I would explicitly store this information only for the leaf nodes, while the information for the parent node can be obtaining through combining the information of all of it's children (e.g. child 1 has 3 objects of A, 1 object of B, child 2 has 1 object of A, 2 objects of C -- parent has 4 objects of A, 1 object of B and 2 of C).

I will be careful when requesting this information from the parent nodes not to first request, use and discard information for a child node and then for its parent node, but the upward construction will be a common operation. Other two common operations are directly derived from what I store: is the object of kind X present? and how many objects of kind X is present? and also how many kinds of objects are present?

Object kinds are represented as integers, and the object numbers are always integer values. What is the better choice (and arguments for the selected choice):

use std::multiset<int>, and operate with std::multiset::count() and std::multiset::find() operations (easier union but duplication of elements, total distinct element count hard to obtain)
use std::map<int, std::size_t> with the kind as a key and number of objects as a value (no duplicate elements, std::map::find() function present, size gives the correct number of object kinds stored, but accessing a non-existent element increases the size unintentionally)

Thank you for your suggestions!

679

asked May 30 '12 09:05

penelope

2 Answers

What about a sorted std::vector<int>? The operations you need can be satisfied as follows:

Is the object of kind X present? std::binary_search
How many objects of kind X are there? std::equal_range, subtract .first from .second
How many kinds of objects are present?
- std::unique_copy followed by size() of the copy, or...
- use a separate counter, call std::binary_search before inserting into the vector

Advantages to this approach are cache locality (all your data is contiguous) and lower memory footprint compared to a tree-like structure. Without knowing more about your data, I can't say for sure whether it would be faster or slower. You'll have to profile it to find out, but I have a hunch this will perform better than you might expect.

The biggest tradeoff here is expressiveness. The std::map approach probably does a better job of logically conveying what you're doing, i.e., a relationship between object IDs and a count.

answered Oct 19 '22 09:10

Michael Kristofik

To store a total of n items with k distinct values per your comparison predicate, an std::multiset allocates n binary search tree nodes(*). An std::map allocates only k (slightly larger) nodes.

You'd use std::multiset when two items can be considered equal by your comparison predicate, but must still be explicitly stored, because they differ in some aspect that the comparison predicate does not check. Also, iterating over a multiset generates each of the n items, whereas a map would generate each of the k distinct items with the count for each.

In the case where the items are just integers, go with std::map. Your "how many distinct items" query would then just be a call to size, which runs in constant time.

Your claim that "accessing a non-existent element increases the size unintentionally" is only true if you use operator[] to access nodes. find does not exhibit this behavior.

(*) The C++ standard does not guarantee that these containers are implemented as (balanced) BSTs, but in all implementations that I've seen, they are.

136

answered Oct 19 '22 09:10

Fred Foo

Related questions
                            
                                Dijkstra Shortest Path with VertexList = ListS in boost graph
                            
                                How do i add an icon to QComboBox in Qt?
                            
                                C++ RAII Questions
                            
                                jsoncpp formatting problems
                            
                                Use STL to populate a vector<T> from map<T,Z>'s keys
                            
                                Safety of casting between pointers of two identical classes?
                            
                                What is the C++ standard library equivalent for mkstemp?
                            
                                boost::mutex::~mutex(): Assertion `!pthread_mutex_destroy(&m)' failed
                            
                                SWIG C++ to Python: Warning(362): operator= ignored
                            
                                How to link STL in c++ code?
                            
                                Casting integer to function pointer in signal code - why does this work?
                            
                                Does casting a pointer to "void*" have any effect when placement new is called?
                            
                                Correct way to cast address of int to char pointer
                            
                                Qt bidirectional client server using QTcpSocket and QTcpServer
                            
                                Default capacity of std::string?
                            
                                Variadic Templates pack expansions
                            
                                GCC #pragma message ignored
                            
                                Eclipse CDT Builtin Include Directories
                            
                                CMake coloring errors and warnings
                            
                                Is there any C++ style guide that talks about numeric literal suffixes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With