Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sliding Window over Time - Data Structure and Garbage Collection

I am trying to implement something along the lines of a Moving Average.

In this system, there are no guarantees of a quantity of Integers per time period. I do need to calculate the Average for each period. Therefore, I cannot simply slide over the list of integers by quantity as this would not be relative to time.

I can keep a record of each value with its associated time. We will have a ton of data running through the system so it is important to 'garbage collect' the old data.

It may also be important to note that I need to save the average to disk after the end of each period. However, they may be some overlap between saving the data to disk and having data from a new period being introduced.

What are some efficient data structures I can use to store, slide, and garbage collect this type of data?

like image 996
Kurtis Avatar asked Oct 15 '13 16:10

Kurtis


People also ask

What is a sliding time window?

Abstract: With the uses sliding time window to define the starting time and ending time of GPS data involved in the solution, the sliding time window makes the latest observed data added to the real-time data solution and raises efficiently the real-time property of the results of data analysis.

What type of algorithm is sliding window?

The Sliding window is a problem-solving technique of data structure and algorithm for problems that apply arrays or lists. These problems are painless to solve using a brute force approach in O(n²) or O(n³). However, the Sliding window technique can reduce the time complexity to O(n).

How does sliding window algorithm work?

The Sliding Window algorithm is one way programmers can move towards simplicity in their code. This algorithm is exactly as it sounds; a window is formed over some part of data, and this window can slide over the data to capture different portions of it.

Why is it used sliding window technique?

Window Sliding Technique is a computational technique which aims to reduce the use of nested loop and replace it with a single loop, thereby reducing the time complexity.


1 Answers

The description of the problem and the question conflict: what is described is not a moving average, since the average for each time period is distinct. ("I need to compute the average for each period.") So that admits a truly trivial solution:

For each period, maintain a count and a sum of observations.

At the end of the period, compute the average

I suspect that what is actually wanted is something like: Every second (computation period), I want to know the average observation over the past minute (aggregation period).

This can be solved simply with a circular buffer of buckets, each of which represents the value for one computation period. There will be aggregation period / computation period such buckets. Again, each bucket contains a count and a sum. Also, a current total/sum and a cumulative total sum/count are maintained. Each observation is added to the current total/sum.

At the end of a each computation period:

  • subtract the sum/count for the (circularly) first period from the cumulative sum/count
  • add the current sum/count to the cumulative sum/count
  • report the average based on the cumulative sum/count
  • replace the values of the first period with the current sum/count
  • clear the current sum/count
  • advance the origin of the circular buffer.

If you really need to be able to compute at any time at all the average of the previous observations over some given period, you'd need a more complicated data structure, basically an expandable circular buffer. However, such precise computations are rarely actually necessary, and a bucketed approximation, as per the above algorithm, is usually adequate for data purposes, and is much more sustainable over the long term for memory management, since its memory requirements are fixed from the start.

like image 107
rici Avatar answered Oct 19 '22 23:10

rici