Elasticsearch and Kafka are two examples of distributed systems but both take different approaches towards having intelligence in data producers - in ES producers of data have no say on where the data will be stored. They simply ask the cluster to store the data. In Kafka the producer knows the internal state of the cluster (it knows which nodes the partitions of a topic are on) and is able to tell the cluster to store the data on a particular partition.
Clearly Kafka and ES are built for different use cases but I'm struggling to connect those use cases to this design decision - why does Kafka allow producers to determine where to store data but ES doesn't?
You can take data you’ve stored in Kafka and stream it into Elasticsearch to then be used for log analysis or full-text search. Alternatively, you can perform real-time analytics on this data or use it with other applications like Kibana. For some background on what Elasticsearch is, you can read this blog post by Sarwar Bhuiyan.
As messages are sent into Kafka with a key, the external supplier is able to send another record through to Kafka with amendments to the price, updating the record in Elasticsearch rather than creating a new record.
In this architecture, processing is typically split into 2 separate stages — the Shipper and Indexer stages. The Logstash instance that receives data from different data sources is called a Shipper as it doesn't do much processing. Its responsibility is to immediately persist data received to a Kafka topic, and hence, its a producer.
Each node runs Kafka 2.1.1, along with Filebeat and Metricbeat to monitor the node. The Beats are configured via Cloud ID to send data to our Elasticsearch Service cluster.
They simply ask the cluster to store the data
That's not true. In ES you can leverage routing
in order to decide in which shard your document will end up. Pretty much the same concept as deciding on which Kafka topic partition your message will be stored.
Kafka and ES are clearly built for different use cases. The former is a distributed commit log and the latter is a search and analytics engine. Different products, different use cases.
Even though they are different, they are complimentary and can work pretty well "together" via Logstash, where Kafka can play the role of an input buffer to Elasticsearch
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With