Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fluentd vs Kafka

The use case is this: I've several java applications running which all have to interact with different (each one has a specific target) elasticsearch indices. For instance an application A uses the indices A,B,C of ElasticSearch to query and update. Application B uses indices A,C,D(say).

Some common interface is required which can manage all these data streams. Currently I'm evaluating Kafka and fluentd for this purpose. Can someone explain which will be better suited for this situation. I've looked at features of both Kafka and Fluentd and I don't really understand the difference it would make here. Thanks a lot.

like image 585
Akshay Arora Avatar asked Feb 02 '16 04:02

Akshay Arora


People also ask

What is Fluentd Kafka?

Fluentd retrieves logs from different sources and puts them in kafka. Kafka Connect retrieves Kafka data logs for indexing in ElasticSearch. Our stack is composed of: Fluentd : An open source data collector for unified logging layer.

What is Kafka technology?

Kafka is primarily used to build real-time streaming data pipelines and applications that adapt to the data streams. It combines messaging, storage, and stream processing to allow storage and analysis of both historical and real-time data.

What is Kafka connect?

Kafka Connect is the pluggable, declarative data integration framework for Kafka. It connects data sinks and sources to Kafka, letting the rest of the ecosystem do what it does so well with topics full of events.


1 Answers

kafka provides publish/subscribe messaging as a distributed commit log. Usually you install kafka on each host where you need to produce some data to be forwarded somewhere else and all those hosts will together form a cluster. The good thing here is that if for some reason network connectivity becomes unstable or goes down, your application can continue to produce data/logs and they won't be lost. Whereas if your application directly sends logs to some remote centralized logging host, you might lose some logs during the time the network goes down.

fluentd is a centralized log collector which is commonly installed on one host (or more if you need horizontal scaling). It connects to remote data sources, applies filtering and sends unified log data to remote data sinks.

From the fluentd docs, you can see that fluentd can consume data from kafka and produce data towards kafka as well. This alone should hint that fluentd and kafka are on different layers since the former uses the latter.

It would be more logical to compare fluentd and logstash actually. As far as fluentd is concerned, kafka is just another data source and/or data sink, but they are different beasts altogether.

If you want the best of both worlds, use kafka as input/output data pipes from/to your apps and fluentd (or logstash) as your centralized logging system reading from those kafka topics.

If you want to read more on the topic, you can read how fluentd and kafka complement each other very well, read they are not competing against each other.

like image 193
Val Avatar answered Oct 02 '22 11:10

Val