Sliding Window over Time - Data Structure and Garbage Collection

Tags:

I am trying to implement something along the lines of a Moving Average.

In this system, there are no guarantees of a quantity of Integers per time period. I do need to calculate the Average for each period. Therefore, I cannot simply slide over the list of integers by quantity as this would not be relative to time.

I can keep a record of each value with its associated time. We will have a ton of data running through the system so it is important to 'garbage collect' the old data.

It may also be important to note that I need to save the average to disk after the end of each period. However, they may be some overlap between saving the data to disk and having data from a new period being introduced.

What are some efficient data structures I can use to store, slide, and garbage collect this type of data?

996

asked Oct 15 '13 16:10

Kurtis

1 Answers

The description of the problem and the question conflict: what is described is not a moving average, since the average for each time period is distinct. ("I need to compute the average for each period.") So that admits a truly trivial solution:

For each period, maintain a count and a sum of observations.

At the end of the period, compute the average

I suspect that what is actually wanted is something like: Every second (computation period), I want to know the average observation over the past minute (aggregation period).

This can be solved simply with a circular buffer of buckets, each of which represents the value for one computation period. There will be aggregation period / computation period such buckets. Again, each bucket contains a count and a sum. Also, a current total/sum and a cumulative total sum/count are maintained. Each observation is added to the current total/sum.

At the end of a each computation period:

subtract the sum/count for the (circularly) first period from the cumulative sum/count
add the current sum/count to the cumulative sum/count
report the average based on the cumulative sum/count
replace the values of the first period with the current sum/count
clear the current sum/count
advance the origin of the circular buffer.

If you really need to be able to compute at any time at all the average of the previous observations over some given period, you'd need a more complicated data structure, basically an expandable circular buffer. However, such precise computations are rarely actually necessary, and a bucketed approximation, as per the above algorithm, is usually adequate for data purposes, and is much more sustainable over the long term for memory management, since its memory requirements are fixed from the start.

107

answered Oct 19 '22 23:10

rici

Related questions
                            
                                existential search and query without the fuss
                            
                                Efficiently find all sets S from a collection of sets C that are contained in a set D
                            
                                Fast solution to Subset sum algorithm by Pisinger
                            
                                Is there an algorithm to find the Shannon entropy for text?
                            
                                Dynamic programming exercise for string cutting
                            
                                Meaning of the terms O(1) space and without using extra space
                            
                                An efficient sorting algorithm for almost sorted list containing time data?
                            
                                What spatial indexing algorithm should I use?
                            
                                Weighted random sample in python
                            
                                Optimal sequence to brute force solve a keypad code lock [duplicate]
                            
                                Partitioning a String of Brackets
                            
                                How to compress pointer ? eg. arbitrary bit pointer
                            
                                Boyer-Moore-Horspool Algorithm for All Matches (Find Byte array inside Byte array)
                            
                                Given a set of n points (x,y), is it possible to find the number of pairs of points with negative slopes between them in O(n logn) time?
                            
                                How causal consistency is different to sequential consistency?
                            
                                max-weight k-clique in a complete k-partite graph
                            
                                Algorithm to find MST in a huge complete graph
                            
                                Do processors actually calculate multiplication by a zero or one? Why?
                            
                                How does Google News automatically categorize articles into Tech/Science/Health/Entertainment/etc?
                            
                                Pebbling a Checkerboard with Dynamic Programming

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sliding Window over Time - Data Structure and Garbage Collection

Tags:

algorithm

data-structures

jvm

sliding-window

Kurtis

People also ask

1 Answers

rici

Recent Activity

Donate For Us