Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I customize partitioning in Kinesis Firehose before delivering to S3?

I have a Firehose stream that is intended to ingest millions of events from different sources and of different event-types. The stream should deliver all data to one S3 bucket as a store of raw\unaltered data.

I was thinking of partitioning this data in S3 based on metadata embedded within the event message like event-souce, event-type and event-date.

However, Firehose follows its default partitioning based on record arrival time. Is it possible to customize this partitioning behavior to fit my needs?

Update: Accepted answer updated as a new answer suggests the feature is available as of Sep 2021

like image 387
mowienay Avatar asked Jul 12 '18 20:07

mowienay


People also ask

What is dynamic partitioning in Firehose?

Dynamic partitioning enables you to continuously partition streaming data in Kinesis Data Firehose by using keys within data (for example, customer_id or transaction_id ) and then deliver the data grouped by these keys into corresponding Amazon Simple Storage Service (Amazon S3) prefixes.

What are all the basic restrictions apply to tags in Kinesis data firehose?

Basic restrictionsThe maximum number of tags per resource (stream) is 50. Tag keys and values are case-sensitive. You can't change or edit tags for a deleted stream.

What is the difference between Amazon Kinesis and Firehose?

Data Streams is a low latency streaming service in AWS Kinesis with the facility for ingesting at scale. On the other hand, Kinesis Firehose aims to serve as a data transfer service. The primary purpose of Kinesis Firehose focuses on loading streaming data to Amazon S3, Splunk, ElasticSearch, and RedShift.

Can Firehose create a delivery stream to S3?

We have now created successfully a delivery stream using Amazon Kinesis Firehose for S3 and have successfully tested it. You can look more into Kinesis Firehose where the destination might be Amazon Redshift or the producer might be a Kinesis datastream.


1 Answers

Since September 1st, 2021, AWS Kinesis Firehose supports this feature. Read the announcement blog post here.

From the documentation:

You can use the Key and Value fields to specify the data record parameters to be used as dynamic partitioning keys and jq queries to generate dynamic partitioning key values. ...

Here is how it looks like from UI:

enter image description here enter image description here

like image 186
Vlad Holubiev Avatar answered Sep 29 '22 18:09

Vlad Holubiev