I have a C++ program that is inserting approx. 18 million numbers of type uint64_t into a google dense_hash_set.
The numbers are all the even numbers below 2^64 with the property
N >= radical(N)^4.
The inserts are an order of magnitude slower than, for example, inserting 18 million random numbers, or 18 million sequential numbers.
When performing the inserts, the code seems to spend most of its time executing the statement
if ( test_empty(bucknum) )
Is 18 million a sensible number of items to insert into a dense_hash_set?
Is there any way of speeding up the inserts?
Relevant lines are
uint64_t N;
google::dense_hash_set<uint64_t> evencandidates;
evencandidates.set_empty_key(-1);
.....
evencandidates.insert(N);
Solved by replacing the default hash function with std::tr1::hash. The declaration of the dense hash set becoming:
google::dense_hash_set<uint64_t, std::tr1::hash<uint64_t> > evencandidates;
The criterion for choosing numbers to store,
N >= radical(N)^4 and N even
results in the numbers being stored have a lot of common factors, particluarly several powers of 2 for most numbers. Choosing a hash function that works well with this set of numbers solves the performance problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With