Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store a continuous stream of data so that you can access the most frequent values?

A stream of stock updates are coming in the form of ticker, buy/sell pair (example YHOO,300) from the securities exchange(at a very high rate). We have to always display the top 5 most traded stock by volume at any given point of time. How would you maintain these values in memory?

My idea is to use a hashmap(Ticker->Volume) to update our datastore from the feed O(1).And then create a Treemap based on the volume to display the top5 stocks O(1). But what I'm not able to come up is an efficient way to sync the Treemap data as the values in the hashmap change (very frequently) . Any suggestions?

like image 700
Raj Avatar asked Nov 09 '22 21:11

Raj


1 Answers

If the number of trades is an always increasing cumulative value (which it seems to be), you can just maintain:

  • Your hashmap for updating the number of trades in O(1).
  • A simple array of 5 elements representing the top 5 tickers along with their volume.

When a new update arrives, you update your hashmap in O(1) and, if the new number of trades is big enough to enter the top 5, you update also your array. Checking the array and updating it can be done in O(1). You can also use binary search if you don't want to compare the new volume with all 5 values.

You can't do much better than checking an average of log(5) values.


If the number of trades in not increasingly monotone (which seems unlikely, but can happen if you want to consider the number of trades done in a bounded interval of time), you can also maintain:

  • Your hashmap (Ticker -> Volume) for saving the updates on ticker volumes.
  • For the top 5, you need to use a heap sorted by decreasing volumes.

When an update arrives you will:

  1. Get the previous value (if any) of the Volume for the ticker from the hashmap in O(1)
  2. Remove the old value (if present) from the heap in O(log n), looking up by the old volume
  3. Add the new value in the heap in O(log n)
  4. Update the new value in the hashmap in O(1)

The total complexity is O(log n) where "n" is the number of tickers.

Even if you need the top 5 tickers, when the volume of one ticker goes down, you need to find the 6th element that will become 5th, and a 7th element .. and so on. I don't think you can go much lower than O(log n) in the complexity if the number of trades is not monotone.

like image 198
Nicola Ferraro Avatar answered Nov 14 '22 21:11

Nicola Ferraro