I'm programming in Java. Every 100 ms my program gets a new number.
It has a cache with contains the history of the last n = 180
numbers.
When I get a new number x
I want to calculate how many numbers there are in the cache which are smaller than x
.
Afterwards I want to delete the oldest number in the cache.
Every 100 ms I want to repeat the process of calculating how many smaller numbers there are and delete the oldest number.
Which algorithm should I use? I would like to optimize for making the calculation fast as it's not the only thing that calculated on those 100 ms.
For practical reasons and reasonable values of n
you're best of with a ring-buffer of primitive int
s (to keep track of oldest entry), and a linear scan for determining how many values are smaller than x
.
In order for this to be in O(log n)
you would have to use something like Guavas TreeMultiset. Here is an outline of how it would look.
class Statistics {
private final static int N = 180;
Queue<Integer> queue = new LinkedList<Integer>();
SortedMap<Integer, Integer> counts = new TreeMap<Integer, Integer>();
public int insertAndGetSmallerCount(int x) {
queue.add(x); // O(1)
counts.put(x, getCount(x) + 1); // O(log N)
int lessCount = 0; // O(N), unfortunately
for (int i : counts.headMap(x).values()) // use Guavas TreeMultiset
lessCount += i; // for O(log n)
if (queue.size() > N) { // O(1)
int oldest = queue.remove(); // O(1)
int newCount = getCount(oldest) - 1; // O(log N)
if (newCount == 0)
counts.remove(oldest); // O(log N)
else
counts.put(oldest, newCount); // O(log N)
}
return lessCount;
}
private int getCount(int x) {
return counts.containsKey(x) ? counts.get(x) : 0;
}
}
On my 1.8 GHz laptop, this solution performs 1,000,000 iterations on about 13 seconds (i.e. one iteration takes about 0.013 ms, well under 100 ms).
You can keep an array of 180 numbers and save an index to the oldest one so that when a new number comes in you overwrite the number at the oldest index and increment the index modulo 180 (it's a bit more complex than that since you need special behaviour for the first 180 numbers).
As for calculating how many numbers are smaller I would use the brute force way (iterate all the numbers and count).
Edit: I find it funny to see that the "optimized" version runs five times slower than this trivial implementation (thanks to @Eiko for the analysis). I think this is due to the fact that when you use trees and maps you lose data locality and have many more memory faults (not to mention memory allocation and garbage collection).
Add your numbers to a list. If size > 180, remove the first number. Counting is just iterating over the 180 elements which is probably fast enough. It's hard to beat performance wise.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With