Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

druid vs Elasticsearch

I'm new to druid. I've already read "druid VS Elasticsearch", but I still don't know what druid is good at.

Below is my problem:

  1. I have a solr cluster with 70 nodes.

  2. I have a very big table in solr which has 1 billion rows, and each row has 100 fields.

  3. The user will use different combinations range query of fields (20 combinations at least in one query) to count the distinct number of customer id, but the solr's distinct count algorithm is very slow and uses a lot of memory, so if the query result is more than 200 thousand, the solr's query node will crash.

Does druid has better performance than solr in distinct count?

like image 287
zhouxiang Avatar asked Aug 24 '16 09:08

zhouxiang


1 Answers

Druid is vastly different from search-specific databases like ES/Solr. It is a database designed for analytics, where you can do rollups, column filtering, probabilistic computations, etc.

Druid does count distinct through its use of HyperLogLog, which is a probabilistic data-structure. So if you dont worry about 100% accuracy, you can definitely try Druid and I have seen drastic improvements in response times in one of my projects. But, if you care about accuracy, then Druid might not be the best solution (even though it is quite possible to achieve in Druid as well, with performance hits and extra space taken up) - see more here: https://groups.google.com/forum/#!topic/druid-development/AMSOVGx5PhQ

like image 188
Ramkumar Venkataraman Avatar answered Sep 18 '22 23:09

Ramkumar Venkataraman