Calculating Percentiles on the fly

Question

I'm programming in Java. Every 100 ms my program gets a new number.

It has a cache with contains the history of the last n = 180 numbers. When I get a new number x I want to calculate how many numbers there are in the cache which are smaller than x. Afterwards I want to delete the oldest number in the cache.

Every 100 ms I want to repeat the process of calculating how many smaller numbers there are and delete the oldest number.

Which algorithm should I use? I would like to optimize for making the calculation fast as it's not the only thing that calculated on those 100 ms.

13 revs · Accepted Answer

For practical reasons and reasonable values of n you're best of with a ring-buffer of primitive ints (to keep track of oldest entry), and a linear scan for determining how many values are smaller than x.

In order for this to be in O(log n) you would have to use something like Guavas TreeMultiset. Here is an outline of how it would look.

class Statistics {

    private final static int N = 180;
    Queue<Integer> queue = new LinkedList<Integer>();
    SortedMap<Integer, Integer> counts = new TreeMap<Integer, Integer>();

    public int insertAndGetSmallerCount(int x) {

        queue.add(x);                                // O(1)
        counts.put(x, getCount(x) + 1);              // O(log N)

        int lessCount = 0;                           // O(N), unfortunately
        for (int i : counts.headMap(x).values())     // use Guavas TreeMultiset
            lessCount += i;                          // for O(log n)

        if (queue.size() > N) {                      // O(1)
            int oldest = queue.remove();             // O(1)
            int newCount = getCount(oldest) - 1;     // O(log N)
            if (newCount == 0)
                counts.remove(oldest);               // O(log N)
            else
                counts.put(oldest, newCount);        // O(log N)
        }

        return lessCount;
    }

    private int getCount(int x) {
        return counts.containsKey(x) ? counts.get(x) : 0;
    }

}

On my 1.8 GHz laptop, this solution performs 1,000,000 iterations on about 13 seconds (i.e. one iteration takes about 0.013 ms, well under 100 ms).

Motti · Answer

You can keep an array of 180 numbers and save an index to the oldest one so that when a new number comes in you overwrite the number at the oldest index and increment the index modulo 180 (it's a bit more complex than that since you need special behaviour for the first 180 numbers).

As for calculating how many numbers are smaller I would use the brute force way (iterate all the numbers and count).

Edit: I find it funny to see that the "optimized" version runs five times slower than this trivial implementation (thanks to @Eiko for the analysis). I think this is due to the fact that when you use trees and maps you lose data locality and have many more memory faults (not to mention memory allocation and garbage collection).

Eiko · Answer

Add your numbers to a list. If size > 180, remove the first number. Counting is just iterating over the 180 elements which is probably fast enough. It's hard to beat performance wise.

Calculating Percentiles on the fly

Tags:

java

algorithm

statistics

Christian

3 Answers

13 revs

Motti

Eiko

Recent Activity

Donate For Us

Calculating Percentiles on the fly

Tags:

java

algorithm

statistics

Christian

3 Answers

13 revs

Motti

Eiko

Related questions

Recent Activity

Donate For Us