Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can druid replace hadoop?

Tags:

hadoop

druid

Druid is used for both real time and batch processing. But can it totally replace hadoop? If not why? As in what is the advantage of hadoop over druid? I have read that druid is used along with hadoop. So can the use of Hadoop be avoided?

like image 937
Amit Sharma Avatar asked Jun 09 '14 14:06

Amit Sharma


People also ask

What is Druid in Hadoop?

Druid is an open-source analytics data store designed for business intelligence (OLAP) queries on event data. Druid provides low latency (real-time) data ingestion, flexible data exploration, and fast data aggregation.

What is deep storage in Druid?

It is a storage mechanism that Apache Druid does not provide. This deep storage infrastructure defines the level of durability of your data, as long as Druid processes can see this storage infrastructure and get at the segments stored on it, you will not lose data no matter how many Druid nodes you lose.

What is Hadoop used for?

Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.


1 Answers

Can you avoid using Hadoop with Druid? Yes, you can stream data in real-time into a Druid cluster rather than batch-loading it with Hadoop. One way to do this is to stream data into Kafka, which will handle incoming events and pass them into Storm, which can then process and load them into Druid Realtime nodes.

Typically this setup is used with Hadoop in parallel, because streamed real-time data comes with its own baggage and often needs to be fixed up and backfilled. That whole architecture has been dubbed "Lambda" by some.

like image 169
user766353 Avatar answered Sep 29 '22 20:09

user766353