Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch vs Kafka: Putting intelligence in producers

Elasticsearch and Kafka are two examples of distributed systems but both take different approaches towards having intelligence in data producers - in ES producers of data have no say on where the data will be stored. They simply ask the cluster to store the data. In Kafka the producer knows the internal state of the cluster (it knows which nodes the partitions of a topic are on) and is able to tell the cluster to store the data on a particular partition.

Clearly Kafka and ES are built for different use cases but I'm struggling to connect those use cases to this design decision - why does Kafka allow producers to determine where to store data but ES doesn't?

like image 709
Nick Peterson Avatar asked Aug 19 '17 21:08

Nick Peterson


People also ask

How can I use Kafka data in Elasticsearch?

You can take data you’ve stored in Kafka and stream it into Elasticsearch to then be used for log analysis or full-text search. Alternatively, you can perform real-time analytics on this data or use it with other applications like Kibana. For some background on what Elasticsearch is, you can read this blog post by Sarwar Bhuiyan.

How does Kafka work with external suppliers?

As messages are sent into Kafka with a key, the external supplier is able to send another record through to Kafka with amendments to the price, updating the record in Elasticsearch rather than creating a new record.

What is the difference between shipper and producer in Kafka?

In this architecture, processing is typically split into 2 separate stages — the Shipper and Indexer stages. The Logstash instance that receives data from different data sources is called a Shipper as it doesn't do much processing. Its responsibility is to immediately persist data received to a Kafka topic, and hence, its a producer.

What version of Kafka do you use for monitoring?

Each node runs Kafka 2.1.1, along with Filebeat and Metricbeat to monitor the node. The Beats are configured via Cloud ID to send data to our Elasticsearch Service cluster.


1 Answers

They simply ask the cluster to store the data

That's not true. In ES you can leverage routing in order to decide in which shard your document will end up. Pretty much the same concept as deciding on which Kafka topic partition your message will be stored.

Kafka and ES are clearly built for different use cases. The former is a distributed commit log and the latter is a search and analytics engine. Different products, different use cases.

Even though they are different, they are complimentary and can work pretty well "together" via Logstash, where Kafka can play the role of an input buffer to Elasticsearch

like image 72
Val Avatar answered Sep 20 '22 06:09

Val