Are there any specialized databases - rdbms, nosql, key-value, or anything else - that are optimised for running fast aggregate queries or map-reduces like this over very large data sets:
select date, count(*)
from Sales
where [various combinations of filters]
group by date
So far I've run benchmarks on MongoDB and SQL Server, but I'm wondering if there's a more specialized solution, preferably one that can scale data horizontally.
In my experience, the real issue has less to do with aggregate query performance, which I find good in all major databases I've tried, than it has to do with the way queries are written.
I've lost count of the number of times I've seen enormous report queries with huge amounts of joins and inline subquery aggregates all over the place.
Off the top of my head, the typical steps to make these things faster are:
Use window functions where available and applicable (i.e. the over ()
operator). There's absolutely no point in refetching data multiple times.
Use common table expressions (with
queries) where available and applicable (i.e. sets that you know will be reasonably small).
Use temporary tables for large intermediary results, and create indexes on them (and analyze them) before using them.
Work on small result sets by filtering rows earlier when possible: select id, aggregate from (aggregate on id) where id in (?) group by id
can made much faster by rewriting it as select id, aggregate from (aggregate on id where id in (?)) group by id
.
Use union/except/intersect all
rather than union/except/intersect
where applicable. This removes pointless sorting of result sets.
As a bonus the first three steps all tend to make the report queries more readable and thus more maintainable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With