Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to transfer data from S3 bucket to Kafka

There are examples and documentation on copying data from Kafka topics to S3 but how do you copy data from S3 to Kafka?

like image 810
txporttm Avatar asked Apr 03 '19 15:04

txporttm


People also ask

Can Kafka read from S3?

simlar Hadoop related tools can read from S3 and write events to Kafka as well. The problems with this approach is that you need to keep track of what files have been read so far, as well as handle partially read files.

How do I transfer data to Kafka?

Step1: Start the zookeeper as well as the kafka server. Step2: Type the command: 'kafka-console-producer' on the command line. This will help the user to read the data from the standard inputs and write it to the Kafka topic.

What is Kafka S3 connector?

The S3 connector, currently available as a sink, allows you to export data from Kafka topics to S3 objects in either Avro or JSON formats. In addition, for certain data layouts, S3 connector exports data by guaranteeing exactly-once delivery semantics to consumers of the S3 objects it produces.

Can Kafka be used for file transfer?

Sending large files directly via Kafka is possible and sometimes easier to implement. The architecture is much simpler and more cost-effective.


1 Answers

When you read an S3 object, you get a byte stream. And you can send any byte array to Kafka with ByteArraySerializer.

Or you can parse that InputStream to some custom object, then send that using whatever serializer you can configure.

You can find one example of a Kafka Connect process here (which I assume you are comparing to Confluent's S3 Connect writer) - https://jobs.zalando.com/tech/blog/backing-up-kafka-zookeeper/index.html that can be configured to read binary archives or line-delimted text from S3.

Similarly, Apache Spark, Flink, Beam, NiFi, etc. simlar Hadoop related tools can read from S3 and write events to Kafka as well.


The problems with this approach is that you need to keep track of what files have been read so far, as well as handle partially read files.

like image 127
OneCricketeer Avatar answered Sep 29 '22 20:09

OneCricketeer