Why is unordered_map "find + insert" faster than "insert + check for success"?

Question

I'm using an unordered_map as a sparse 3D-array (128 x 128 x 128) to insert values into a grid, provided the grid cell is still free.

Up until now I always checked with find() if the cell is free and if it is, then I've added an element using insert() or emplace(). Now I've found that I can use the return value of insert and emplace to check if the element has been added or if there was already an element with the same key inside the map. I thought this could improve performance since I could completely remove the usage of find.

As it turns out, rather than improving the performance by inserting without find, the performance actually decreased and I'm not sure why.

I've reduced my application to this example where points are randomly generated and then inserted into the grid.

#include <unordered_map> #include <random> #include <chrono> #include <iostream> #include <math.h> #include <algorithm> #include <string>  using std::cout; using std::endl; using std::chrono::high_resolution_clock; using std::chrono::milliseconds; using std::chrono::duration_cast; using std::unordered_map;  int num_elements = 5'000'000;   void findThenInsert(){     cout << endl << "find and emplace" << endl;      auto start = high_resolution_clock::now();      std::mt19937 gen(123);     std::uniform_real_distribution<> dis(0, 128);      unordered_map<int, int> grid;     int count = 0;     for(int i = 0; i < num_elements; i++){         float x = dis(gen);         float y = dis(gen);         float z = (cos(x*0.1) * sin(x*0.1) + 1.0) * 64.0;          int index = int(x) + int(y) * 128 + int(z) * 128 * 128;         auto it = grid.find(index);         if(it == grid.end()){             grid.emplace(index, count);             count++;         }     }      cout << "elements: " << count << endl;     cout << "load factor: " << grid.load_factor() << endl;      auto end = high_resolution_clock::now();     long long duration = duration_cast<milliseconds>(end - start).count();     float seconds = duration / 1000.0f;     cout << seconds << "s" << endl; }   void insertThenCheckForSuccess(){     cout << endl << "emplace and check success" << endl;      auto start = high_resolution_clock::now();      std::mt19937 gen(123);     std::uniform_real_distribution<> dis(0, 128);      unordered_map<int, int> grid;     int count = 0;     for(int i = 0; i < num_elements; i++){         float x = dis(gen);         float y = dis(gen);         float z = (cos(x*0.1) * sin(x*0.1) + 1.0) * 64.0;          int index = int(x) + int(y) * 128 + int(z) * 128 * 128;         auto it = grid.emplace(index, count);         if(it.second){             count++;         }     }      cout << "elements: " << count << endl;     cout << "load factor: " << grid.load_factor() << endl;      auto end = high_resolution_clock::now();     long long duration = duration_cast<milliseconds>(end - start).count();     float seconds = duration / 1000.0f;     cout << seconds << "s" << endl; }  int main(){      findThenInsert();     insertThenCheckForSuccess();  }

In both cases the size of the map is 82901 afterwards so I assume the result is exactly the same.

 find and emplace:   0.937s emplace then check: 1.268s

ecatmur · Accepted Answer

The problem is that the specification of emplace for associative containers in effect requires an allocation even in the failure case; the cost of this allocation and reallocation dominates the cost of a failed probe in the find-then-insert strategy.

This is because emplace is specified to emplace-construct value_type (i.e. pair<Key const, T>) from its forwarded arguments; only once it has constructed the pair can it hash the key to check whether it is already present. (It can't just take the first argument, because that could be std::piecewise_construct.) It also can't construct the pair in automatic storage and then move it into a node, because emplace is not specified to require a copyable or even movable value_type, so it has to perform a potentially expensive node allocation on every call. (Note that the ordered associative containers have the same problem, but there the O(log n) cost of a probe is more significant compared to the cost of allocation.)

Unless your inserts are expected to succeed in the majority of cases, you are better off using find-then-emplace over emplace-then-test. You could also use insert, as long as you make sure you're calling the value_type overload and not the template that forwards to emplace.

This is (probably) fixed in C++17, which (should) have try_emplace, with similar semantics to emplace but improved performance in the failure case. (The semantic difference is that the mapped type is not emplace-constructed in the failure case; this makes it possible to e.g. store unique_ptr as the mapped type.)

Chris Drew · Answer

I think the issue is that you are using emplace instead of insert. The problem is that emplace functions in associative containers usually allocate memory for the node even if the key is already present. So that if you are regularly emplacing duplicates those memory allocations are wasted. If you used insert instead it would only do the memory allocation if the insert is successful.

Scott Meyers says to only prefer emplace functions over insert functions if "the container won't reject the value being added due to it being a duplicate"

I can't quite reproduce your results exactly but my testing shows that insert (not emplace) then test is even faster than find then emplace:

auto it = grid.insert({index, count});

This decision might also depend on how costly it is to create your value type. find does not need to construct the value type, it just needs the key. But emplace and insert need the key and the value type so in cases where it is costly to create the value it may be faster to use find and only create the value if you need to. In this case your value is just an int so I expect insert or emplace to always win over find-then-emplace.

Why is unordered_map "find + insert" faster than "insert + check for success"?

Tags:

Markus

2 Answers

ecatmur

Chris Drew

Recent Activity

Donate For Us

Why is unordered_map "find + insert" faster than "insert + check for success"?

Tags:

Markus

2 Answers

ecatmur

Chris Drew

Related questions

Recent Activity

Donate For Us