I'm publishing data to a kinesis stream that is processed by some consumers. I'd like the raw data published to the stream to also be stored in s3. Is it possible to auto wire a kinesis stream to a kinesis firehose or do I need to directly publish to the firehose from a kinesis consumer?
You can configure Amazon Kinesis Data Streams to send information to a Kinesis Data Firehose delivery stream.
Data Streams is a low latency streaming service in AWS Kinesis with the facility for ingesting at scale. On the other hand, Kinesis Firehose aims to serve as a data transfer service. The primary purpose of Kinesis Firehose focuses on loading streaming data to Amazon S3, Splunk, ElasticSearch, and RedShift.
If your data source is Kinesis Data Streams and the data delivery to your Amazon S3 bucket fails, then Amazon Kinesis Data Firehose will retry to deliver data every 5 seconds for up to a maximum period of what is configured on Kinesis Data Streams.
To put data into the stream, you must specify the name of the stream, a partition key, and the data blob to be added to the stream. The partition key is used to determine which shard in the stream the data record is added to. All the data in the shard is sent to the same worker that is processing the shard.
Update@Aug 18, 2017
Kinesis Firehose can now read data directly from Amazon Kinesis Streams!
https://forums.aws.amazon.com/ann.jspa?annID=4904
Before Aug 18,2017
Is it possible to auto wire a kinesis stream to a kinesis firehose
At the moment, you can't accomplish this, so you have to wire them by yourself.
AWS provides an OSS lambda project to forward Kinesis Streams to Kinesis Firehose.
https://github.com/awslabs/lambda-streams-to-firehose
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With