The use case is this: I've several java applications running which all have to interact with different (each one has a specific target) elasticsearch indices. For instance an application A uses the indices A,B,C of ElasticSearch to query and update. Application B uses indices A,C,D(say). Some common interface is required which can manage all these data streams. Currently I'm evaluating Kafka and fluentd for this purpose. Can someone explain which will be better suited for this situation. I've looked at features of both Kafka and Fluentd and I don't really understand the difference it would make here. Thanks a lot.

kafka provides publish/subscribe messaging as a distributed commit log. Usually you install kafka on each host where you need to produce some data to be forwarded somewhere else and all those hosts will together form a cluster. The good thing here is that if for some reason network connectivity becomes unstable or goes down, your application can continue to produce data/logs and they won't be lost. Whereas if your application directly sends logs to some remote centralized logging host, you might lose some logs during the time the network goes down. fluentd is a centralized log collector which is commonly installed on one host (or more if you need horizontal scaling). It connects to remote data sources, applies filtering and sends unified log data to remote data sinks. From the fluentd docs, you can see that fluentd can consume data from kafka and produce data towards kafka as well. This alone should hint that fluentd and kafka are on different layers since the former uses the latter. It would be more logical to compare fluentd and logstash actually. As far as fluentd is concerned, kafka is just another data source and/or data sink, but they are different beasts altogether. If you want the best of both worlds, use kafka as input/output data pipes from/to your apps and fluentd (or logstash) as your centralized logging system reading from those kafka topics. If you want to read more on the topic, you can read how fluentd and kafka complement each other very well, read they are not competing against each other.

Fluentd vs Kafka

Tags:

elasticsearch

apache-kafka

fluentd

The use case is this: I've several java applications running which all have to interact with different (each one has a specific target) elasticsearch indices. For instance an application A uses the indices A,B,C of ElasticSearch to query and update. Application B uses indices A,C,D(say).

Some common interface is required which can manage all these data streams. Currently I'm evaluating Kafka and fluentd for this purpose. Can someone explain which will be better suited for this situation. I've looked at features of both Kafka and Fluentd and I don't really understand the difference it would make here. Thanks a lot.

585

asked Feb 02 '16 04:02

Akshay Arora

1 Answers

kafka provides publish/subscribe messaging as a distributed commit log. Usually you install kafka on each host where you need to produce some data to be forwarded somewhere else and all those hosts will together form a cluster. The good thing here is that if for some reason network connectivity becomes unstable or goes down, your application can continue to produce data/logs and they won't be lost. Whereas if your application directly sends logs to some remote centralized logging host, you might lose some logs during the time the network goes down.

fluentd is a centralized log collector which is commonly installed on one host (or more if you need horizontal scaling). It connects to remote data sources, applies filtering and sends unified log data to remote data sinks.

From the fluentd docs, you can see that fluentd can consume data from kafka and produce data towards kafka as well. This alone should hint that fluentd and kafka are on different layers since the former uses the latter.

It would be more logical to compare fluentd and logstash actually. As far as fluentd is concerned, kafka is just another data source and/or data sink, but they are different beasts altogether.

If you want the best of both worlds, use kafka as input/output data pipes from/to your apps and fluentd (or logstash) as your centralized logging system reading from those kafka topics.

If you want to read more on the topic, you can read how fluentd and kafka complement each other very well, read they are not competing against each other.

193

answered Oct 02 '22 11:10

Val

Related questions
                            
                                How to setup ElasticSearch cluster with auto-scaling on Amazon EC2?
                            
                                Elasticsearch 6: Rejecting mapping update as the final mapping would have more than 1 type
                            
                                List all fields in an elasticsearch index?
                            
                                Configure port number of ElasticSearch
                            
                                Elasticsearch URI based query with AND operator
                            
                                Multi-"match-phrase" query in Elastic Search
                            
                                Timestamp not appearing in Kibana
                            
                                elasticsearch / kibana errors "Data too large, data for [@timestamp] would be larger than limit
                            
                                what does _doc represents in elasticsearch?
                            
                                ElasticSearch not returning results for terms query against string property
                            
                                Filename search with ElasticSearch
                            
                                How to limit query time in elasticsearch?
                            
                                Multi tenancy in Elastic Search
                            
                                Elasticsearch GET request with request body
                            
                                How to add a new node to my Elasticsearch cluster
                            
                                Aggregation + sorting +pagination in elastic search
                            
                                How to fix ElasticSearch conflicts on the same key when two process writing at the same time
                            
                                How to do an ElasticSearch Select Distinct
                            
                                How to query elasticsearch for greater than and less than?
                            
                                elasticsearch: create index with mappings using javascript

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With