Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Minimum and maximum of the last 1000 values of the changing list

I'm creating an iterative algorithm (Monte Carlo method). The algorithm returns a value on every iteration, creating a stream of values.

I need to analyze these values and stop the algorithm when say 1000 returned values are withing some epsilon.

I decided to implement it calculation the max and min values of the last 1000 values, and then calculate the error using this formula (max-min)/min and compare it to epsilon: error<=epsilon. And if this condition is reached, stop the iterations and return the result.

  1. The first hare-brained idea was to use a list and append new values to it, calculating the max and min values for the last 1000 values of it after each appending.

  2. Then I decided there is no use of keeping more that 1000 last values. So I remembered of deque. It was a very good idea since the complexity on adding and deleting on both ends of deque object is O(1). But it didn't solve the problem of needing to go through all the last 1000 values on each iteration to calculate min and max.

  3. Then I remembered there is the heapq module. It keeps the data in such a way as to efficiently return the smallest one at every moment. But I need both the smallest and the largest ones. Furthermore I need to preserve the order of the elements so that I can keep 1000 last returned elements of the algorithm, and I don't see how I can achieve it with heapq.

Having all those thoughts in mind I decided to ask here:

How can I solve this task the most efficiently?

like image 544
ovgolovin Avatar asked Oct 24 '11 13:10

ovgolovin


2 Answers

If you are free / willing to change your definition of error, you might want to consider using the variance instead of (max-min)/min.

You can compute the variance incrementally. True, using this method, you are not deleting any values from your stream -- the variance will depend on all the values. But so what? With enough values, the first few won't matter a great deal to the variance, and the variance of the average, variance/n, will become small when enough values cluster around some fixed value.

So, you can choose to halt when the variance/n < epsilon.

like image 117
unutbu Avatar answered Sep 18 '22 23:09

unutbu


As a refinement of @unutbu's excellent idea, you could consider using exponentially-weighted moving variance. It can be computed in O(1) time per observation, requires O(1) space, and has the advantage of automatically reducing the observation's weight as the observation gets older.

The following paper has the relevant formulae: link. See equations (140)-(143) therein.

Finally, you might want to work with the standard deviation instead of variance. It is simply the square root of variance, and has the advantage of having the same units as the original data. This should make it easier to formulate a meaningful stopping criterion.

like image 20
NPE Avatar answered Sep 17 '22 23:09

NPE