I tried comparing performance of std::unordered_set::find
and std::find
. Contrary to my expectation and rule https://clang.llvm.org/extra/clang-tidy/checks/performance-inefficient-algorithm.html std::find
was significantly faster
quick-bench with unordered_set
unorder_set
contains int
s so hash shouldn't be an issue. std::set::find
acts as expected and is faster than std::find
.
quick-bench with set Can anyone please explain this behaviour? Thank you
On the topic of choosing a set-like container, most experienced C++ developers will tell you: “Use std::unordered_set instead of std::set . It's more efficient. Certainly for large data sets.”
For a small number of elements, lookups in a set might be faster than lookups in an unordered_set . Even though many operations are faster in the average case for unordered_set , they are often guaranteed to have better worst case complexities for set (for example insert ).
Sets vs Unordered Sets Set is an ordered sequence of unique keys whereas unordered_set is a set in which key can be stored in any order, so unordered. Set is implemented as a balanced tree structure that is why it is possible to maintain order between the elements (by specific tree traversal).
Some of this also depends on the hardware and the implementation. But to get a clearer idea of what's going on, it can be useful to graph the time taken for a number of different sizes with each.
For this test, I used Microsoft's compiler, so some difference from clang/llvm isn't particularly surprising. Just for grins, I threw in an extra, testing std::lower_bound
(after sorting the array, of course) in addition to find
, set
, and unordered_set
.
I also did the testing a bit differently, generating random numbers to fill the container, and a set of 1000 random values to search for in the container. That's (probably) responsible for the less than linear growth at the right end for std:find
. Microsoft's random number generator only has a 15-bit range, so with 100,000 elements, we're going to hit every value it can generate well before we've generated 100,000 values, so in the last test, the searching was limited by the range of values we could generate rather than the size of the array.
I suppose if I were ambitious, I'd rewrite it using a better random number generator with a larger range, but I think this is enough to establish the trends, and give a pretty good idea of the expected result from that modification.
Edit: corrected misalignment of data pasted into spreadsheet.
The problem is that you chose a set that was too small.
Here is an example with a 1000 elements.
#include <unordered_set>
#include <set>
const static std::unordered_set<int> mySet {
0,
1,
2,
...
998,
999
};
static void UsingSetFind(benchmark::State& state) {
// Code inside this loop is measured repeatedly
for (auto _ : state) {
auto it = mySet.find(435);
benchmark::DoNotOptimize(it);
}
}
// Register the function as a benchmark
BENCHMARK(UsingSetFind);
static void UsingStdFind(benchmark::State& state) {
// Code before the loop is not measured
for (auto _ : state) {
auto it = std::find(mySet.begin(), mySet.end(), 345);
benchmark::DoNotOptimize(it);
}
}
BENCHMARK(UsingStdFind);
The difference is amazing
mySet.find(435)
Will search it like it was a hash table, really quick. While
std::find(mySet.begin(), mySet.end(), 345);
Will go 1 by 1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With