Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practice for reading data from Kafka to AWS Redshift

What is the best practice for moving data from a Kafka cluster to a Redshift table? We have continuous data arriving on Kafka and I want to write it to tables in Redshift (it doesn't have to be in real time).

  • Should I use Lambda function?
  • Should I write a Redshift connector (consumer) that will run on a dedicated EC2 instance? (downside is that I need to handle redundancy)
  • Is there some AWS pipeline service for that?
like image 546
Eran Avatar asked Jul 30 '18 13:07

Eran


1 Answers

Kafka Connect is commonly used for streaming data from Kafka to (and from) data stores. It does useful things like automagically managing scaleout, fail over, schemas, serialisation, and so on.

This blog shows how to use the open-source JDBC Kafka Connect connector to stream to Redshift. There is also a community Redshift connector, but I've not tried this.

This blog shows another approach, not using Kafka Connect.

Disclaimer: I work for Confluent, who created the JDBC connector.

like image 196
Robin Moffatt Avatar answered Oct 02 '22 09:10

Robin Moffatt