Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do you use Apache Kafka for? [closed]

Tags:

I would like to ask if my understanding of Kafka is correct.

For really really big data stream, conventional database is not adequate so people use things such as Hadoop or Storm. Kafka sits on top of said databases and provide ...directions where the real time data should go?

like image 943
Loredra L Avatar asked May 17 '16 11:05

Loredra L


People also ask

What can Apache Kafka be used for?

Kafka is primarily used to build real-time streaming data pipelines and applications that adapt to the data streams. It combines messaging, storage, and stream processing to allow storage and analysis of both historical and real-time data.

What problem does Kafka solve?

By dividing partition assignments, Kafka can parallelize the process of reading data by consuming applications. There's a catch. Kafka can only assign a single partition to at most one consumer (but one consumer can get many partitions).

When should I use Kafka over REST API?

The purpose of APIs is to essentially provide a way to communicate between different services, development sides, microservices, etc. The REST API is one of the most popular API architectures out there. But when you need to build an event streaming platform, you use the Kafka API.

How Netflix uses Kafka?

Essentially, it consumes data streams from various Kafka topics and is able to process or transform this as needed. Post-processing, this data stream is published to another Kafka topic to be used downstream and/or transform an existing topic.


1 Answers

I don't think so.

Kafka is messaging system and it does not sit on top of database.

You can compare Kafka with messaging systems like ActiveMQ, RabbitMQ etc.

From Apache documentation page

Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.

Key takeaways:

  1. Kafka maintains feeds of messages in categories called topics.
  2. We'll call processes that publish messages to a Kafka topic producers.
  3. We'll call processes that subscribe to topics and process the feed of published messages consumers..
  4. Kafka is run as a cluster comprised of one or more servers each of which is called a broker.

enter image description here

Communication between the clients and the servers is done with a simple, high-performance, language agnostic TCP protocol.

Use Cases:

  1. Messaging: Kafka works well as a replacement for a more traditional message broker. In this domain Kafka is comparable to traditional messaging systems such as ActiveMQ or RabbitMQ
  2. Website Activity Tracking: The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds
  3. Metrics: Kafka is often used for operational monitoring data, which involves aggregating statistics from distributed applications to produce centralized feeds of operational data
  4. Log Aggregation
  5. Stream Processing
  6. Event sourcing is a style of application design where state changes are logged as a time-ordered sequence of records.
  7. Commit Log: Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data
like image 92
Ravindra babu Avatar answered Oct 15 '22 04:10

Ravindra babu