How to transfer data from S3 bucket to Kafka

1 Answers

When you read an S3 object, you get a byte stream. And you can send any byte array to Kafka with ByteArraySerializer.

Or you can parse that InputStream to some custom object, then send that using whatever serializer you can configure.

You can find one example of a Kafka Connect process here (which I assume you are comparing to Confluent's S3 Connect writer) - https://jobs.zalando.com/tech/blog/backing-up-kafka-zookeeper/index.html that can be configured to read binary archives or line-delimted text from S3.

Similarly, Apache Spark, Flink, Beam, NiFi, etc. simlar Hadoop related tools can read from S3 and write events to Kafka as well.

The problems with this approach is that you need to keep track of what files have been read so far, as well as handle partially read files.

127

answered Sep 29 '22 20:09

OneCricketeer

Related questions
                            
                                AWS Lambda for CodeCommit repo sync
                            
                                CloudFront responds with 403 Forbidden instead of triggering Lambda
                            
                                cloudformation error: Received 1 FAILURE signal(s) out of 1. Unable to satisfy 100% MinSuccessfulInstancesPercent requirement
                            
                                Dynamodb batch put if item does not exist
                            
                                how to generate Access key & secret key for AWS roles
                            
                                Simplest lambda function to copy a file from one s3 bucket to another
                            
                                AWS Glue convert files from JSON to Parquet with same partitions as source table
                            
                                Amazon AWS Lambda: Cannot find "Request"
                            
                                AWS Fargate hostname not doable?
                            
                                How can we measure the vCPU usage for Amazon ECS/Ec2 instances?
                            
                                MySql localhost vs Amazon RDS instance Performance
                            
                                AWS ECS Fargate not creating task AmazonECSTaskExecutionRole error
                            
                                does aws s3 select work with multiple files?
                            
                                Lambda downtime while deploying new version
                            
                                Equal connection distribution is not happening in Aurora autoscaled insatnces
                            
                                AWS CodeDeploy is not able to deploy lambda function
                            
                                Access AWS Glue from local Spark
                            
                                Where to store API gateway credentials in AWS serverless website served from S3?
                            
                                sam local start-api gives error while testing SAM application in local
                            
                                AWS SAM CLI java8 runtime is not supported

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to transfer data from S3 bucket to Kafka

Tags:

amazon-web-services

amazon-s3

apache-kafka

txporttm

People also ask

1 Answers

OneCricketeer

Recent Activity

Donate For Us