Apache Kafka is an real-time messaging service. It stores streams of data safely in distributed and fault-tolerant. We can filter streaming data when comming producer. I don't understant that why we need NoSQL databases like as MongoDB to store same data in Apache Kafka. The true question is that why we store same data in a NoSQL database and Apache Kafka?
I think if we need a NoSQL database, we can collect streams of data from clients in MongoDB at first without the use of Apache Kafka. But, most of big data architecture preference using Apache Kafka between data source and NoSQL database.(see)
What is the advantages of that for real systems?
Developers describe Kafka as a “Distributed, fault-tolerant, high throughput, pub-sub, messaging system.” Kafka is well-known as a partitioned, distributed, and replicated commit log service. It also provides the functionality of a messaging system, but with a unique design.
Data storage in KafkaIt does not rely much on disk reads as a database would because it wants to leverage the page cache to serve the data. Only older data is likely to have disk storage at any given time. Kafka has a two-tiered storage approach constituting local and remote storage.
Why would you use Kafka? Kafka is used to build real-time streaming data pipelines and real-time streaming applications. A data pipeline reliably processes and moves data from one system to another, and a streaming application is an application that consumes streams of data.
This architecture has several advantages:
Kafka as Data Integration Bus
It helps distribute data between several producers and many consumers easily. Here Apache Kafka serves as an "data" integration message bus.
Kafka as Data Buffer
Putting Kafka in front of your "end" data storages like MongoDB or MySQL acts like a natural data buffer. So you are able to deploy/maintain/redeploy your consumer services independently. At the time your service is down for maintanance Kafka is still storing all incoming data, that is quite useful.
Kafka as a Short Time Data Storage
You don't have to store everything in Kafka: very often you use Kafka topics with retention. It means all data older than some value will be deleted by Kafka automatically. So, for example you may have Kafka topic with 1 week retention (so you store 1 week of data only) but at the same time your data lives in long time storage services like classic SQL-DBs or Cassandra etc.
Kafka as a Long Time Data Storage
On the other hand you can use Apache Kafka as a long term storage system. Using compacted topics enables you to store only the last value for each key. So your topic becomes a last state storage of your app.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With