Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

getting the average, p95 and p99 of a stream of data

I have incoming data and I want to compute the average, 95th and 99th percentile of that data - I am most interested in the last 1000 values. At any time, I'd like to query this object to get any of the three values (this can occur at any time, not just when the numbers seen mod 1000 is 0). Is there a way to get these three values without keeping the last 1000 samples?

This doesn't have to be perfect so we can use some tricks to get a good estimate. Also, speed is another concern. Thanks

(I will be doing this in C++ but I don't think that matters all that much)

like image 213
jamesatha Avatar asked May 08 '13 22:05

jamesatha


People also ask

What is p95 and P99 latency?

According to the doc: Request latency is in milliseconds, and p95 and p99 values are the 95th and 99th percentile values (a request latency p99 value of 500ms means that 99 out of 100 requests took 500ms or less to complete). It says 99 out of 100 requests took 500ms or less.

What is p95 average?

For example, p95 is the 95th percentile and means that 95 percent of the data within the period is lower than this value and 5 percent of the data is higher than this value. Percentiles help you get a better understanding of the distribution of your metric data.

How do you calculate P99 latency?

The simplest way to calculate the 99 percentile, is to sort all the values, and take the 99/100th value. For example, if you had 1,000 latency values, place them into an array, sort them, then take the value at the 990th index.


1 Answers

At a minimum, you'll need to maintain a queue of the most recent 1000 elements.

To keep a running average, maintain a running total of the most recent 1000 elements; when you add a new element to the queue you add its value to the total, and you also subtract the value of the oldest element that you've just removed from the queue. Return the total divided by 1000 and there you go.

To keep a running Nth percentile, maintain two heaps and keep a count of the elements in the heaps; the "lower" heap has the lower N% of the values, and the "upper" heap has the upper (1-N)% (for example, the lower 95th percentile heap will have 950 elements, and the upper 5th percentile heap will have 50 elements). At any point you can return the lowest element from the upper heap, and that's your percentile. When you remove an element from the queue of recent values, then remove the value from the heaps as well. If this leaves the heaps unbalanced (eg the lower heap has 951 elements and the upper heap has 49 elements) then shift elements to balance them out (eg remove the top element from the lower heap and add it to the upper heap).

Since you want two percentiles, use three heaps - the lower heap has the lower 950 elements, the middle has the next 40, and the upper has the highest 10. Return the lowest element of the middle heap for the 95th percentile, and the lowest element of the upper heap for the 99th percentile.

Adding and removing heap elements is O(lg(n)), so that is the cost of adding a new element to the queue and three heaps: remove the oldest queue element from the heaps (O(lg(n)), add the new queue element to the appropriate heap (O(lg(n)), and balance the heaps if need be (again, O(lg(n)). Add the new element to the lowest heap whose highest element is greater than the heap element, i.e.

if (newElement < lowestHeap.maxElement) {
    lowestHeap.add(newElement)
} else if (newElement < middleHeap.maxElement) {
    middleHeap.add(newElement)
} else { 
    highestHeap.add(newElement)
}

Be sure that your heaps allow duplicate elements

like image 129
Zim-Zam O'Pootertoot Avatar answered Nov 10 '22 22:11

Zim-Zam O'Pootertoot