Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast calculation of min, max, and average of incoming numbers

Program is receiving approximately 50,000 numbers every second.

At ANY given moment, I need to calculate minimum, maximum and average of the values (numbers) that arrived in the last second (regarding to given moment).

Is there a way to do this without using array or list (buffer) to store arriving numbers and to calculate results?

If I need to use buffer, what would be the efficient way to achieve this?

(Note that numbers from buffer also must be efficiently removed from time to time)

like image 832
Dusan Avatar asked Apr 23 '12 20:04

Dusan


People also ask

How do you find minimum maximum and average?

Keep a minimum variable that is initialized to a high value, and update it if you see a lower value. Do the opposite with a maximum variable. Add up all numbers and divide that sum by the total count to get the average.

What is the used of sum average max and min function?

The Average function calculates the average, or arithmetic mean, of its arguments. The Max function finds the maximum value. The Min function finds the minimum value. The Sum function calculates the sum of its arguments.

What is min max average?

Min and max: Shows you the lowest (minimum) and highest (maximum) values in your column. Mean: Also called the average. The sum of all the values in your column, divided them by the total number of values. Median: The number that would be in the middle of an ordered list of your values.


1 Answers

Here is an algorithm that will somewhat work to save efficiency in certain cases:

  1. As events come in, buffer them completely, and calculate a running sum, count, min, max (trivial).

  2. When a request for average, min, or max is made, loop through from the back of the buffer and start removing values older than one second. Subtract from sum and count as you go.

    • If the values are all above min you can keep your min. If the values are below max, you can keep your max. In this scenario, you have average, min, and max updated efficiently.

    • If the values are below min or above max, you'll need to loop through the rest of the array and recalculate it.

  3. Do step two once a second or so also so that the buffer doesn't get too full. This code could be performed on every buffer insert also, or wherever made sense.

Best structure for this kind of work is a circular buffer, to avoid memory allocations and GC getting in the way. It should be large enough to cover the worst case scenario for message size per second.

Updates

Depending on the usage scenario one other thing to do would be to run the algorithm above but in 10 x 100ms chunks rather than 1 x 1000ms piece. That is, keep the running min, max, sum and count on those 10 chunks. Then when you reach an 'invalidation' scenario, you generally only need to look through the latest 100ms of data or a quick pass through the min and max of the other 9 chunks.


@ja72 provided a great idea to save on finding the min and max values if they are invalidated:

Instead of keeping the min/max values x_min, x_max keep instead the index of where they are located in the x[i] array with i_min and i_max. Then finding them can be trivial sometimes, but when the last value considered holds the min and max, the entire list needs to be scanned to establish the new limits.


Sam Holder had another good idea in the comments - keep a parallel array that is always sorted, this lets you lop numbers off the top or bottom to find new minimums and maximums easier. However, insert speed here is compromised a little (needs to remain in order).


Ultimately, the right choice will depend on the usage characteristics of the program. How often will values be read vs how often they are inserted?

like image 193
yamen Avatar answered Sep 21 '22 00:09

yamen