Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating Percentiles on the fly

I'm programming in Java. Every 100 ms my program gets a new number.

It has a cache with contains the history of the last n = 180 numbers. When I get a new number x I want to calculate how many numbers there are in the cache which are smaller than x. Afterwards I want to delete the oldest number in the cache.

Every 100 ms I want to repeat the process of calculating how many smaller numbers there are and delete the oldest number.

Which algorithm should I use? I would like to optimize for making the calculation fast as it's not the only thing that calculated on those 100 ms.

like image 355
Christian Avatar asked Oct 19 '10 06:10

Christian


3 Answers

For practical reasons and reasonable values of n you're best of with a ring-buffer of primitive ints (to keep track of oldest entry), and a linear scan for determining how many values are smaller than x.

In order for this to be in O(log n) you would have to use something like Guavas TreeMultiset. Here is an outline of how it would look.

class Statistics {

    private final static int N = 180;
    Queue<Integer> queue = new LinkedList<Integer>();
    SortedMap<Integer, Integer> counts = new TreeMap<Integer, Integer>();

    public int insertAndGetSmallerCount(int x) {

        queue.add(x);                                // O(1)
        counts.put(x, getCount(x) + 1);              // O(log N)

        int lessCount = 0;                           // O(N), unfortunately
        for (int i : counts.headMap(x).values())     // use Guavas TreeMultiset
            lessCount += i;                          // for O(log n)

        if (queue.size() > N) {                      // O(1)
            int oldest = queue.remove();             // O(1)
            int newCount = getCount(oldest) - 1;     // O(log N)
            if (newCount == 0)
                counts.remove(oldest);               // O(log N)
            else
                counts.put(oldest, newCount);        // O(log N)
        }

        return lessCount;
    }

    private int getCount(int x) {
        return counts.containsKey(x) ? counts.get(x) : 0;
    }

}

On my 1.8 GHz laptop, this solution performs 1,000,000 iterations on about 13 seconds (i.e. one iteration takes about 0.013 ms, well under 100 ms).

like image 165
13 revs Avatar answered Nov 12 '22 18:11

13 revs


You can keep an array of 180 numbers and save an index to the oldest one so that when a new number comes in you overwrite the number at the oldest index and increment the index modulo 180 (it's a bit more complex than that since you need special behaviour for the first 180 numbers).

As for calculating how many numbers are smaller I would use the brute force way (iterate all the numbers and count).


Edit: I find it funny to see that the "optimized" version runs five times slower than this trivial implementation (thanks to @Eiko for the analysis). I think this is due to the fact that when you use trees and maps you lose data locality and have many more memory faults (not to mention memory allocation and garbage collection).

like image 44
Motti Avatar answered Nov 12 '22 17:11

Motti


Add your numbers to a list. If size > 180, remove the first number. Counting is just iterating over the 180 elements which is probably fast enough. It's hard to beat performance wise.

like image 3
Eiko Avatar answered Nov 12 '22 18:11

Eiko