I cant help think that there aren't many use case that can be effectively served by Cassandra better than Druid. As a time series store or key value, queries can be written in Druid to extract data however needed. The argument here is more around justifying Druid than Cassandra.
Apart from the Fast writes in Cassandra, is there really anything else ? Esp given the real time aggregations/and querying capabilities of Druid, does it not outweigh Cassandra.
For a more straight question that can be answered - doesnt Druid provide a superset of features as comapred to Cassandra and wouldn't one be better off in using druid rightaway? For all use cases?
For a more straight question that can be answered - doesnt Druid provide a superset of features as comapred to Cassandra and wouldn't one be better off in using druid rightaway? For all use cases?
Not at all, they are not comparable. We are talking about two very different technologies here. Easy way is to see Cassandra as a distributed storage solution, but Druid a distributed aggregator (i.e. an awesome open-source OLAP-like tool (: ). The post you are referring to, in my opinion, is a bit misleading in the sense that it compares the two projects in the world of data mining, which is not cassandra's focus.
Druid is not good at point lookup, at all. It loves time series and its partitioning is mainly based on date-based segments (e.g. hourly/monthly etc. segments that may be furthered sharded based on size).
Druid pre-aggregates your data based on pre-defined aggregators -- which are numbers (e.g. Sum the number of click events in your website with a daily granularity, etc.). If one wants to store a key lookup from a string to say another string or an exact number, Druid is the worst solution s/he can look for.
Not sure this is really a SO type of question, but the easy answer is that it's a matter of use case. Simply put, Druid shines when it facilitates very fast ad-hoc queries to data that has been ingested in real time. It's read consistent now and you are not limited by pre-computed queries to get speed. On the other hand, you can't write to the data it holds, you can only overwrite.
Cassandra (from what I've read; haven't used it) is more of an eventually consistent data store that supports writes and does very nicely with pre-compute. It's not intended to continuously ingest data while providing real-time access to ad-hoc queries to that same data.
In fact, the two could work together, as has been proposed on planetcassandra.org in "Cassandra as a Deep Storage Mechanism for Druid Real-Time Analytics Engine!".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With