Say you have a server that constantly gets HTTP requests. Your boss needs some stats, and asks you to compute the number of hits within the last minute at any given time.
What algorithm and data-structure would you use to achieve this?
Use a circular buffer.
Whenever you have to keep some current statistics with a built-in obsolescence, a ring buffer is a good candidate. In your case, you can easily keep count of the requests in the last minute by inserting new packets at the front of the circular buffer and keeping a one-minute-before-now pointer in the buffer, or performing a binary search on request time.
An dynamic array, of which an append-only file is the on-disk counterpart. Just append each hit to the array as it comes in, so it's sorted by time. Appending takes amortized constant time. You can do a binary search (or interpolation search) to find the point where the last minute starts, then sum up to the end in O(lg n) (or (O(lg lg n)) time.
Alternatively, do a linear scan from the end instead of binary search, which will be faster if the file grows very large and you only ever need the last minute. If the expected number of hits in the last minute is independent of time, this takes only constant time (but reading a file on a spinning disk backwards can be slow).
If you don't have the space to store all the old data, then using a deque is a better idea. Good implementations of deques are available in the standard libraries of, e.g., C++ and Python.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With