I am currently experimenting on some usage of stl-datastructures. However I am still not sure when to use which one and when to use a certain combination. Currently I am trying to figure out, when using a std::multimap
does make sense. As far as I can see, one can easily build ones own multimap implementation by combining std::map
and std::vector
. So I am left with the question when each of these datastructures should be used.
std::vector
).std::multimaps
also have a lot of optimization tricks behind the back to make iterating over equal elements as fast as possible. Also getting to the correct element-range might probably be optimized for std::multimaps
.In order to try out the speed issues I did some simple comparisons using the following program:
#include <stdint.h> #include <iostream> #include <map> #include <vector> #include <utility> typedef std::map<uint32_t, std::vector<uint64_t> > my_mumap_t; const uint32_t num_partitions = 100000; const size_t num_elements = 500000; int main() { srand( 1337 ); std::vector<std::pair<uint32_t,uint64_t>> values; for( size_t i = 0; i <= num_elements; ++i ) { uint32_t key = rand() % num_partitions; uint64_t value = rand(); values.push_back( std::make_pair( key, value ) ); } clock_t start; clock_t stop; { start = clock(); std::multimap< uint32_t, uint64_t > mumap; for( auto iter = values.begin(); iter != values.end(); ++iter ) { mumap.insert( *iter ); } stop = clock(); std::cout << "Filling std::multimap: " << stop - start << " ticks" << std::endl; std::vector<uint64_t> sums; start = clock(); for( uint32_t i = 0; i <= num_partitions; ++i ) { uint64_t sum = 0; auto range = mumap.equal_range( i ); for( auto iter = range.first; iter != range.second; ++iter ) { sum += iter->second; } sums.push_back( sum ); } stop = clock(); std::cout << "Reading std::multimap: " << stop - start << " ticks" << std::endl; } { start = clock(); my_mumap_t mumap; for( auto iter = values.begin(); iter != values.end(); ++iter ) { mumap[ iter->first ].push_back( iter->second ); } stop = clock(); std::cout << "Filling my_mumap_t: " << stop - start << " ticks" << std::endl; std::vector<uint64_t> sums; start = clock(); for( uint32_t i = 0; i <= num_partitions; ++i ) { uint64_t sum = 0; auto range = std::make_pair( mumap[i].begin(), mumap[i].end() ); for( auto iter = range.first; iter != range.second; ++iter ) { sum += *iter; } sums.push_back( sum ); } stop = clock(); std::cout << "Reading my_mumap_t: " << stop - start << " ticks" << std::endl; } }
As I suspected it depends mainly on the ratio between num_partitions
and num_elements
, so I am still at a loss here. Here are some example outputs:
For num_partitions = 100000
and num_elements = 1000000
Filling std::multimap: 1440000 ticks Reading std::multimap: 230000 ticks Filling my_mumap_t: 1500000 ticks Reading my_mumap_t: 170000 ticks
For num_partitions = 100000
and num_elements = 500000
Filling std::multimap: 580000 ticks Reading std::multimap: 150000 ticks Filling my_mumap_t: 770000 ticks Reading my_mumap_t: 140000 ticks
For num_partitions = 100000
and num_elements = 200000
Filling std::multimap: 180000 ticks Reading std::multimap: 90000 ticks Filling my_mumap_t: 290000 ticks Reading my_mumap_t: 130000 ticks
For num_partitions = 1000
and num_elements = 1000000
Filling std::multimap: 970000 ticks Reading std::multimap: 150000 ticks Filling my_mumap_t: 710000 ticks Reading my_mumap_t: 10000 ticks
I am unsure about how to interpret these results. How would you go about deciding for the correct data structure? Are there any additional constraints for the decission, which I might have missed?
Multimap is similar to a map with the addition that multiple elements can have the same keys. Also, it is NOT required that the key-value and mapped value pair have to be unique in this case. One important thing to note about multimap is that multimap keeps all the keys in sorted order always.
The map and the multimap are both containers that manage key/value pairs as single components. The essential difference between the two is that in a map the keys must be unique, while a multimap permits duplicate keys.
Multimaps are part of the C++ STL (Standard Template Library). Multimaps are the associative containers like map that stores sorted key-value pair, but unlike maps which store only unique keys, multimap can have duplicate keys. By default it uses < operator to compare the keys.
Multi-map in C++ is an associative container like map. It internally store elements in key value pair. But unlike map which store only unique keys, multimap can have duplicate keys.
It's hard to tell whether your benchmark is doing the right thing, so I can't comment on the numbers. However, a few general points:
Why multimap
rather than map of vectors: Maps, multimaps, sets and multisets are all essentially the same data structure, and once you have one, it's trivial to just spell out all four. So the first answer is, "why not have it"?
How is it useful: Multimaps are one of those things that you need rarely, but when you need them, you really need them.
Why not roll my own solution? As I said, I'm not sure about those benchmarks, but even if you could make something else that isn't worse than the standard container (which I question), then you should consider the overall burden of getting it right, testing it and maintaining it. Imagine a world in which you would be taxed for every line of code you wrote (that's Stepanov's suggestion). Re-use industry-standard components whenever possible.
Finally, here's the typical way you iterate a multimap:
for (auto it1 = m.cbegin(), it2 = it1, end = m.cend(); it1 != end; it1 = it2) { // unique key values at this level for ( ; it2 != end && it2->first == it1->first; ++it2) { // equal key value (`== it1->first`) at this level } }
A map of vectors comes with the memory overhead for the capacity of each vector. std::vector
typically allocates space for more elements than you actually have. It may not be a big deal for your application, but it's another tradeoff you haven't considered.
If you're doing a lot of reads, then the O(1) lookup time of unordered_multimap
might be a better choice.
If you have a reasonably modern compiler (and given the presence of the auto
keyword, you do) then in general you're going to have a difficult time beating the standard containers in terms of performance and reliability. The people who wrote them are experts. I would always start with the standard container that most easily expresses what you want to do. Profile your code early and often, and if it's not running fast enough, then look for ways to improve it (e.g., using the unordered_
containers when doing mostly reads).
So, to answer your original question, if you need an associative array of values where those values won't be unique, then using std::multimap
definitely makes sense.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With