I have an array of a few million numbers. <pre class="prettyprint"><code>double* const data = new double (3600000); </code></pre> I need to iterate through the array and find the range (the largest value in the array minus the smallest value). However, there is a catch. I only want to find the range where the smallest and largest values are within 1,000 samples of each other. So I need to find the maximum of: range(data + 0, data + 1000), range(data + 1, data + 1001), range(data + 2, data + 1002), ...., range(data + 3599000, data + 3600000). I hope that makes sense. Basically I could do it like above, but I'm looking for a more efficient algorithm if one exists. I think the above algorithm is O(n), but I feel that it's possible to optimize. An idea I'm playing with is to keep track of the most recent maximum and minimum and how far back they are, then only backtrack when necessary. I'll be coding this in C++, but a nice algorithm in pseudo code would be just fine. Also, if this number I'm trying to find has a name, I'd love to know what it is. Thanks.

The algorithm you describe is really O(N), but i think the constant is too high. Another solution which looks reasonable is to use O(N*log(N)) algorithm the following way: <pre class="prettyprint"><code>* create sorted container (std::multiset) of first 1000 numbers * in loop (j=1, j<(3600000-1000); ++j) - calculate range - remove from the set number which is now irrelevant (i.e. in index *j - 1* of the array) - add to set new relevant number (i.e. in index *j+1000-1* of the array) </code></pre> I believe it should be faster, because the constant is much lower.

Algorithm for finding the maximum difference in an array of numbers

Tags:

c++

algorithm

statistics

I have an array of a few million numbers.

double* const data = new double (3600000);

I need to iterate through the array and find the range (the largest value in the array minus the smallest value). However, there is a catch. I only want to find the range where the smallest and largest values are within 1,000 samples of each other.

So I need to find the maximum of: range(data + 0, data + 1000), range(data + 1, data + 1001), range(data + 2, data + 1002), ...., range(data + 3599000, data + 3600000).

I hope that makes sense. Basically I could do it like above, but I'm looking for a more efficient algorithm if one exists. I think the above algorithm is O(n), but I feel that it's possible to optimize. An idea I'm playing with is to keep track of the most recent maximum and minimum and how far back they are, then only backtrack when necessary.

I'll be coding this in C++, but a nice algorithm in pseudo code would be just fine. Also, if this number I'm trying to find has a name, I'd love to know what it is.

Thanks.

376

asked Sep 29 '08 08:09

Imbue

2 Answers

This type of question belongs to a branch of algorithms called streaming algorithms. It is the study of problems which require not only an O(n) solution but also need to work in a single pass over the data. the data is inputted as a stream to the algorithm, the algorithm can't save all of the data and then and then it is lost forever. the algorithm needs to get some answer about the data, such as for instance the minimum or the median.

Specifically you are looking for a maximum (or more commonly in literature - minimum) in a window over a stream.

Here's a presentation on an article that mentions this problem as a sub problem of what they are trying to get at. it might give you some ideas.

I think the outline of the solution is something like that - maintain the window over the stream where in each step one element is inserted to the window and one is removed from the other side (a sliding window). The items you actually keep in memory aren't all of the 1000 items in the window but a selected representatives which are going to be good candidates for being the minimum (or maximum).

read the article. it's abit complex but after 2-3 reads you can get the hang of it.

167

answered Sep 29 '22 22:09

shoosh

The algorithm you describe is really O(N), but i think the constant is too high. Another solution which looks reasonable is to use O(N*log(N)) algorithm the following way:

* create sorted container (std::multiset) of first 1000 numbers
* in loop (j=1, j<(3600000-1000); ++j)
   - calculate range
   - remove from the set number which is now irrelevant (i.e. in index *j - 1* of the array)
   - add to set new relevant number  (i.e. in index *j+1000-1* of the array)

I believe it should be faster, because the constant is much lower.

answered Sep 29 '22 22:09

Drakosha

Related questions
                            
                                Returning std::vector by value
                            
                                c++ - unordered_map complexity
                            
                                g++ "calling" a function without parenthesis (not f() but f; ). Why does it always return 1?
                            
                                SDL embed image inside program executable
                            
                                Wrapping arrays in Boost Python
                            
                                new[] if element default constructor can throw?
                            
                                Cross Platform Floating Point Consistency
                            
                                Converting "normal" std::string to utf-8
                            
                                How to call a C++ constructor from a C-File
                            
                                How does C++ deal with NAN? Is there a standard way or is compiler dependent?
                            
                                Hashing pointers as Keys for unordered_map in C++ STL
                            
                                Any CPU not available in C++/C# solution
                            
                                How to use Qt WebEngine and QWebChannel?
                            
                                Why does map<string, string> accept ints as values?
                            
                                Error: Qualifiers dropped in binding reference of type x to initializer of type y
                            
                                Remove glare from photo opencv
                            
                                Why can't concept refinement use the terse syntax
                            
                                Assigning function to function pointer, const argument correctness?
                            
                                Is it possible to implement always_false in the C++ standard library?
                            
                                Is it possible to access child types in c++ using CRTP?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With