Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why we require Apache Kafka with NoSQL databases?

Apache Kafka is an real-time messaging service. It stores streams of data safely in distributed and fault-tolerant. We can filter streaming data when comming producer. I don't understant that why we need NoSQL databases like as MongoDB to store same data in Apache Kafka. The true question is that why we store same data in a NoSQL database and Apache Kafka?

I think if we need a NoSQL database, we can collect streams of data from clients in MongoDB at first without the use of Apache Kafka. But, most of big data architecture preference using Apache Kafka between data source and NoSQL database.(see) and also see

What is the advantages of that for real systems?

like image 733
tolgabuyuktanir Avatar asked Feb 12 '18 09:02

tolgabuyuktanir


People also ask

Can Kafka be used as NoSQL database?

Developers describe Kafka as a “Distributed, fault-tolerant, high throughput, pub-sub, messaging system.” Kafka is well-known as a partitioned, distributed, and replicated commit log service. It also provides the functionality of a messaging system, but with a unique design.

Does Kafka require a database?

Data storage in KafkaIt does not rely much on disk reads as a database would because it wants to leverage the page cache to serve the data. Only older data is likely to have disk storage at any given time. Kafka has a two-tiered storage approach constituting local and remote storage.

Why do we need Apache Kafka?

Why would you use Kafka? Kafka is used to build real-time streaming data pipelines and real-time streaming applications. A data pipeline reliably processes and moves data from one system to another, and a streaming application is an application that consumes streams of data.


1 Answers

This architecture has several advantages:

  1. Kafka as Data Integration Bus

    It helps distribute data between several producers and many consumers easily. Here Apache Kafka serves as an "data" integration message bus.

  2. Kafka as Data Buffer

    Putting Kafka in front of your "end" data storages like MongoDB or MySQL acts like a natural data buffer. So you are able to deploy/maintain/redeploy your consumer services independently. At the time your service is down for maintanance Kafka is still storing all incoming data, that is quite useful.

  3. Kafka as a Short Time Data Storage

    You don't have to store everything in Kafka: very often you use Kafka topics with retention. It means all data older than some value will be deleted by Kafka automatically. So, for example you may have Kafka topic with 1 week retention (so you store 1 week of data only) but at the same time your data lives in long time storage services like classic SQL-DBs or Cassandra etc.

  4. Kafka as a Long Time Data Storage

    On the other hand you can use Apache Kafka as a long term storage system. Using compacted topics enables you to store only the last value for each key. So your topic becomes a last state storage of your app.

like image 67
codejitsu Avatar answered Oct 19 '22 23:10

codejitsu