Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Incremental price graph approximation

I need to display a crypto currency price graph based similar to what is done on CoinMarketCap: https://coinmarketcap.com/currencies/bitcoin/

There could be gigabytes of data for one currency pair over a long period of time, so sending all the data to the client is not an option. After doing some research I ended up using a Douglas-Peucker Line Approximation Algorithm: https://www.codeproject.com/Articles/18936/A-C-Implementation-of-Douglas-Peucker-Line-Appro It allows to reduce the amount of dots that is sent to the client but there's one problem: every time there's new data I have to go through all the data on the server and as I'd like to update data on the client in real time, it takes a lot of resources.

So I'm thinking about a some kind of progressive algorithm where, let's say, if I need to display data for the last month I can split data into 5 minute intervals, preprocess only the last interval and when it's completed, remove the first one. I'm debating either customising the Douglas-Peucker algorithm (but I'm not sure if it fits this scenario) or finding an algorithm designed for this purpose (any hint would be highly appreciated)

like image 595
SiberianGuy Avatar asked Nov 20 '18 18:11

SiberianGuy


2 Answers

Constantly re-computing the entire reduction points when the new data arrives would change your graph continuously. The graph will lack consistency. The graph seen by one user would be different from the graph seen by another user and the graph would change when the user refreshes the page(this shouldn't happen!), and even in case of server/application shutdown, your data needs to be consistent to what it was before.

  • This is how I would approach:

Your reduced points should be as they are. For suppose, you are getting data for every second and you have computed reduced points for a 5-minute interval graph, save those data points in a limiting queue. Now gather all seconds data for next 5-minutes and perform the reduction operation on these 600 data points and add the final reduced point to your limiting queue.

I would make the Queue synchronous and the main thread would return the data points in the queue whenever there is an API call. And the worker thread would compute the reduction point on the 5-minute data once the data for the entire 5-minute interval is available.

like image 105
Vineeth Chitteti Avatar answered Nov 16 '22 02:11

Vineeth Chitteti


I'd use tree.

A sub-node contains the "precision" and "average" values.

"precision" means the date-range. For example: 1 minute, 10 minutes, 1 day, 1 month, etc. This also means a level in the tree.

"average" is the value that best represents the price for a range. You can use a simple average, a linear regression, or whatever you decide as "best".

So if you need 600 points (say you get the window size), you can find the precision by prec=total_date_range/600, and some rounding to your existing ranges.

Now you have the 'prec' you just need to retrieve the nodes for that 'prec` level.

Being gigabytes of data, I'd slice them into std::vector objects. The tree would store ids to these vectors for the lowest nodes. The rest of nodes could also be implemented by indices to vectors.

Updating with new data only requires to update a branch (or even creating a new one), starting from root, but with not so many sub-nodes.

like image 20
Ripi2 Avatar answered Nov 16 '22 02:11

Ripi2