Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate running mean traffic for last minute

I have a python server that accepts time series data. Now I need to calculate the average traffic for the last minute, output like 90 samples/minute. I'm currently using a python list to hold all time stamps and use a pretty awful way(in my opinion) to calculate this. The code roughly looks like this:

class TrafficCalculator(object):
    timestamps = []

    def run():
        while True:
            # this gets one record of traffic
            data = self.accept_data()
            # get record's timestamp
            timestamp = data.timestamp
            # add to list
            self.timestamps.append(timestamp)
            # get the time one minute ago
            minute_ago = timestamp - datetime.timedelta(minutes=1)
            # find out the first index of the timestamp in the past that's within 1 minute
            for i, t in enumerate(self.timestamp):
                if t > minute_ago:
                    break
            # see how many records are within last minute
            result = len(self.timestamp[i:])
            # throw away the earlier data
            self.timestamp = self.timestamp[i:]

As you can see, I have to do this for every record, if my traffic gets intense, the performance is miserable.

Is there a better data structure or algorithm I can use to make this more performant? Even further, how do I write a test to verify my algorithm? Thanks!

like image 353
Shang Wang Avatar asked Nov 26 '25 10:11

Shang Wang


1 Answers

Use Queue to hold <traffic, timestamp> pair. Here timestamp is the time it has been pushed on Queue(arrives from server). Track the sum of the traffics of Queue. When a new traffic arrives and the difference between its timestamp and Queue's front element's timestamp more than 1 minute, pop front from Queue. And subtract the poped traffic value from sum. Push the new traffic into queue and add to sum.

This way, your queue is working as a window frame to hold the 1 minute traffic all the time. And you are tracking the sum and you know the Queue size, so you can calculate the average.

The space complexity is O(maximum traffic can be arrived within 1 minute). Time complexity is O(1) for getting average at any time.

This is a very conventional algorithm for Query on any running stream of data in constant time complexity.

Note: Unfortunately I don't know Python. Otherwise I would put the implementation.

like image 97
Kaidul Avatar answered Nov 28 '25 01:11

Kaidul



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!