I'd like to improve the speed performance of my box counting method I use in fractal analysis.
I have a stream of ints (about n=2^24 long) and I have to calculate how many different values are in the stream. There's no upper boundary and negative values are allowed (but the amount of negative values is possibly less than sqrt(n)). There's a small correlation in the stream, i.e. the actual element is likely to be equal or not too far from the previous one. In many cases I have a lot of equal values in the whole range.
My first implementation was to put all the elements into a vector and then I applied a std::sort and then std::unique.
The complexity of this method is O(n*log(n)) and I don't think any other algorithm can be faster in general when it comes to scaling. But I'm sure a code must be exists that is faster than this but with the same scaling properties - so faster only with a constant factor. The reasons are:
To eliminate the first ineffectiveness point, I put every elements into a set with set::insert. And in the end I counted the number of elements with set::size.
My expectation was that this code must be faster because only unique values are stored in the set and it don't have to compare new elements with a lot of equal values. But unfortunately this method was 1.5 times slower than the previous one.
To eliminates the second ineffectiveness point too, I not only put every elements into a set, but with the function set::emplace_hint. And every time a gave a hint to put the new element next to the previous one. And in the end, I asked for the size of the set with the set::size
My expectation was that this code must faster the previous one because I can guess the value of the new element and it's better than nothing. But unfortunately this method was 5 times slower then the previous one.
Can you suggest any effective method that can calculates the number of different elements (ints) in a stream? Can you optimize the code if it is known that
The target architecture is modern x86 or x86-64 PC processor (with sse, sse2) and only single thread code is suitable. I prefer not to use boost but c++11.
First, thanks for the many suggestions, patience and understanding and I'm sorry that I cannot test all of the methods and I'm also sure that effectiveness is depends on the details of the stream of ints what I've not provided. However I share the results I've got with VS2013 compiler. (Code is tested under gcc4.7 but not measured.) This topic worth a lot more time to investigate but I've got a solution that fits my needs.
About the methods:
Since you only deal with a limited range of integer numbers, the radix sort algorithm can be effectively employed here, reducing the log(N)
part of the complexity. You can pick any really fast implementation somewhere in the internet. Some of them require SSE support, others are multithreaded or even coded to be run on GPU.
If you have the possibility to use boost::unordered_set
or C++11
std::unordered_set
, then your second approach can be easily modified you use it, also resulting in a linear complexity algorithm. However, if you have at least a few millions of numbers in a stream, I believe the first method would be faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With