I have an algorithm that runs on my dual-core, 3 GHz Intel processor in on average 250ms, and I am trying to optimize it. Currently, I have an std::nth_element
call that is invoked around 6,000 times on std::vector
s of between 150 and 300 elements, taking on average 50ms. I've spent some time optimizing the comparator I use, which currently looks up two double
s from a vector and does a simple <
comparison. The comparator takes a negligible fraction of the time to run std::nth_element
. The comparator's copy-constructor is also simple.
Since this call is currently taking 20% of the time for my algorithm, and since the time is mostly spent in the code for nth_element
that I did not write (i.e. not the comparator), I'm wondering if anyone knows of a way of optimizing nth_element
using SIMD or any other approach? I've seen some questions on parallelizing std::nth_element
using OpenCL and multiple threads, but since the vectors are pretty short, I'm not sure how much benefit I would get from that approach, though I'm open to being told I'm wrong.
If there is an SSE approach, I can use any SSE instruction up to (the current, I think) SSE4.2.
Thanks!
Two thoughts:
Multithreading probably won't speed up processing for any single vector, but might help you as the number of vectors grows large.
Sorting is too powerful a tool for your problem: you're computing the entire order of the vector, but you don't care about anything but the top few. You know for each vector how many elements make up the top 5%, so instead of sorting the whole thing you should make one pass through the array and find the k
largest. You can do this is O(n)
time with k
extra space, so it's probably a win overall.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With