Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Design suggestions for real-time data aggregation?

I'm looking to build some data aggregation stuff in C#, and I'd like something akin to a real-time pivot table, or some sort of continuously updating SQL query, with support for select, sum, average, first, where, and group-by (where first is in the LINQ sense of "give me the first value").

For example, I might have a some sort of table object called Trans with columns Name, Date, and Total, and another table called Price with columns Name and Price. I want to create some sort of Query instance that does a (in pseudo-SQL)

select Name, sum(Total), first(Price) from Trans, Price join on Name group by Name

and pass that to an Aggregator instance that has links to the data sources. Along with this I want to register a callback that is hit whenever a row that the query produces changes. So if the price for the entity named 'XYZ' changes, the callback would trigger with an object containing the new values for that aggregated row. I'd also like the Aggregator to be as efficient as possible, so it would have some sort of indexing scheme so it wouldn't need to to a table scan whenever values changed.

I'm not quite sure what to call this sort of thing, and I'm hoping to be able to implement something entirely in C#, assuming it's not an order of magnitude more complex than I think it might be. I've read about Continuous LINQ and Bindable LINQ, but I couldn't really sense if either fits this problem, or if there would be performance issues (e.g. LINQ aggregations enumerating across the entire table whenever a value changes).

Does anyone know of a project that does something like this I could look at, or have suggestions on how to design/build it myself?

edit: I should note that the data wouldn't actually be in a database, it would be in memory.

like image 353
toasteroven Avatar asked Oct 14 '22 03:10

toasteroven


1 Answers

The first alternative solution is to aggregate using underlying data changes - ie, when I update the a totals record, go and update the sum total too. To do it this way, you would need the old value however, also it then adds overhead to any changes you make to aggregated values. But if whole purpose of the data existing is to be aggregated it might be a viable option.

I do this with my bank balancing app, whenever I insert/modify/delete a transaction the logic also updates the account balance because the balance is searched many times and can soon become expensive to calculate when there are many transactions.

I think structurally too you may have problems if the sums are stored in the database - such as locking issues. I'd always keep these values in memory.

Update: another possible solution is to pass your data access code through a maintenance layer that keeps aggregated values in memory - this would be blistering quick and virtually 0 overhead on inserting/updating/deleting the underlying data. You could also get clever and have this layer transactable so if the data access action fails, you can rollback your aggregation change.

The only downside is that database changes must go through the layer to avoid invalidating the aggregation, and it will need initializing from the database on first run or restart.

like image 194
Adam Houldsworth Avatar answered Oct 18 '22 13:10

Adam Houldsworth