Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Kafka so fast [closed]

Tags:

apache-kafka

If I have same hardware, to use Kafka or our current solution(ServiceMix/Camel). Is there any difference? Can Kafka handle "bigger" data than it? Why?

There is a article to talk about how fast could it be? But I still don't get clearly why Kafka is so fast comparing to other solutions? Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines)

like image 666
Jerry Z. Avatar asked Sep 17 '15 13:09

Jerry Z.


People also ask

Why Kafka is very fast?

Why is Kafka fast? Kafka achieves low latency message delivery through Sequential I/O and Zero Copy Principle. The same techniques are commonly used in many other messaging/streaming platforms. Zero copy is a shortcut to save the multiple data copies between application context and kernel context.

Is Kafka overkill?

As Kafka is designed to handle high volumes of data, it's overkill if you need to process only a small amount of messages per day (up to several thousand). Use traditional message queues such as RabbitMQ for relatively smaller data sets or as a dedicated task queue.

Why Kafka is faster than RabbitMQ?

Kafka offers much higher performance than message brokers like RabbitMQ. It uses sequential disk I/O to boost performance, making it a suitable option for implementing queues. It can achieve high throughput (millions of messages per second) with limited resources, a necessity for big data use cases.

How do I reduce Kafka latency?

Apache Kafka® provides very low end-to-end latency for large volumes of data. This means the amount of time it takes for a record that is produced to Kafka to be fetched by the consumer is short. If you're using a dedicated cluster, adding additional CKUs can reduce latency.


1 Answers

Kafka is fast for a number of reasons. To name a few.

  • Zero Copy - See https://en.wikipedia.org/wiki/Zero-copy basically it calls the OS kernal direct rather than at the application layer to move data fast.
  • Batch Data in Chunks - Kafka is all about batching the data into chunks. This minimises cross machine latency with all the buffering/copying that accompanies this.
  • Avoids Random Disk Access - as Kafka is an immutable commit log it does not need to rewind the disk and do many random I/O operations and can just access the disk in a sequential manner. This enables it to get similar speeds from a physical disk compared with memory.
  • Can Scale Horizontally - The ability to have thousands of partitions for a single topic spread among thousands of machines means Kafka can handle huge loads.
like image 199
phill.tomlinson Avatar answered Sep 26 '22 15:09

phill.tomlinson