Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Streaming data from Kafka into Cassandra in real time

What's the best way to write date from Kafka into Cassandra? I would expect it to be a solved problem, but there doesn't seem to be a standard adapter. A lot of people seem to be using Storm to read from Kafka and then write to Cassandra, but storm seems like somewhat of an overkill for simple ETL operations.

like image 695
EugeneMi Avatar asked Apr 14 '15 17:04

EugeneMi


People also ask

Is Kafka streaming real-time?

Kafka can act as a publisher/subscriber type of system, used for building a read-and-write stream for batch data similar to RabbitMQ. It can also be used for building highly resilient, scalable, real-time streaming and processing applications.

What is stream time in Kafka streams?

Timestamps. Kafka Streams assigns a timestamp to every data record via so-called timestamp extractors. These per-record timestamps describe the progress of a stream with regards to time (although records may be out-of-order within the stream) and are leveraged by time-dependent operations such as joins.

When should you not use Kafka streams?

As point 1 if having just a producer producing message we don't need Kafka Stream. If consumer messages from one Kafka cluster but publish to different Kafka cluster topics. In that case, you can even use Kafka Stream but have to use a separate Producer to publish messages to different clusters.


2 Answers

We are heavily using Kafka and Cassandra through Storm

We rely on Storm because:

  • there are usually a lot of distributed processing (inter-node) steps before result of original message hit Cassandra (Storm bolt topologies)

  • We don't need to maintain consumer state of Kafka (offset) ourselves - Storm-Kafka connector is doing it for us when all products of original message is acked within Storm

  • Message processing is distributed across nodes with Storm natively

Otherwise if it is a very simple case, you might effectively read messages from Kafka and write result to Cassandra without help of Storm

like image 187
viktortnk Avatar answered Oct 28 '22 11:10

viktortnk


Recent release of Kafka came with the connector concept to support source and sinks as first class concepts in the design. With this, you do not need any streaming framework for moving data in/out of Kafka. Here is the Cassandra connector for Kafka that you can use: https://github.com/tuplejump/kafka-connect-cassandra

like image 40
Aravind Yarram Avatar answered Oct 28 '22 13:10

Aravind Yarram