I have 13721057
elements in my std::vector<Sequence>
. I need to sort this vector and grab the first 25 elements. I thought, since you can build a heap in O(N)
it must be faster to pop 25 elements (each being O(logN)
) than sorting the whole vector in O(NlogN)
.
However, when I time the code:
clock_t tStart = clock();
sort(mostFrequent.begin(), mostFrequent.end(), greater<Sequence>());
printf("Time taken: %.2fs\n", (double)(clock() - tStart)/CLOCKS_PER_SEC);
vs.
clock_t tStart = clock();
make_heap(mostFrequent.begin(), mostFrequent.end());
printf("Time taken: %.2fs\n", (double)(clock() - tStart)/CLOCKS_PER_SEC);
It appears to be much faster to sort the whole vector. Why is this?
This is not a full answer but to get the first 25 elements out of 13721057 you better use partial_sort
.
If you only need the 25th element, then nth_element
.
As a side note. For getting the first elements less than X in sorted order, I would do auto mid = std::partition
with a lambda, and then std::sort(begin,mid)
. There may be a better way.
EDIT: As suggested in a comment I also tried with a pre-sorted input and in that case I did manage to get sort faster than make_heap for my "expensive to copy" type, but only by a small margin around 5-10%.
No matter what I try, I am unable to reproduce your results on either Solaris or Linux (gcc 4.4). make_heap
has always come out on the order of 1/3rd the time spent.
int
then a larger "expensive to copy" class.I assume that the actual reason for this discrepancy is that either your <
and >
operators aren't the same complexity or that copying your object is somehow expensive relative to comparing it in a way my test was unable to duplicate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With