Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is difference between Apache flume and Apache storm?

What is difference between Apache flume and Apache storm?

  • Is is possible to ingest logs data into Hadoop cluster using storm?
  • Both are used for streaming data so can storm be used as an alternative to flume?
like image 505
Hassam Avatar asked Nov 03 '17 17:11

Hassam


1 Answers

  • Apache Flume is a service for collecting large amounts of streaming data, particularly logs. Flume pushes data to consumers using mechanisms it calls data sinks. Flume can push data to many popular sinks right out of the box, including HDFS, HBase, Cassandra, and some relational databases.
  • Apache Storm involves streaming data. It is the bridge between batch processing and stream processing, which Hadoop is not natively designed to handle. Storm runs continuously, processing a stream of incoming data and dicing it into batches, so Hadoop can more easily ingest it. Data sources are called spouts and each processing node is a bolt. Bolts perform computations and processes on the data, including pushing output to data stores and other services.
  • If you need something that works out of the box, choose Flume, once you decide whether to push or pull makes more sense. If streaming data is, for now, just a small add-on to your already developed Hadoop environment, Storm is a good choice.

  • It is possible to ingest logs data into the Hadoop cluster using a storm

  • We can use the storm as an alternative to the flume
like image 155
Anand Jain Avatar answered Sep 19 '22 00:09

Anand Jain