I would like to be able to send data sent to kinesis firehose based on the content inside the data. For example if I sent this JSON data:
{
"name": "John",
"id": 345
}
I would like to filter the data based on id and send it to a subfolder of my s3 bucket like: S3://myS3Bucket/345_2018_03_05. Is this at all possible with Kinesis Firehose or AWS Lambda?
The only way I can think of right now is to resort to creating a kinesis stream for every single one of my possible IDs and point them to the same bucket and then send my events to those streams in my application, but I would like to avoid that since there are many possible IDs.
This is not possible out-of-the box, but here's some ideas...
You can write a Data Transformation in Lambda that is triggered by Amazon Kinesis Firehose for every record. You could code Lambda to save to save the data to a specific file in S3, rather than having Firehose do it. However, you'd miss-out on the record aggregation features of Firehose.
You could use Amazon Kinesis Analytics to look at the record and send the data to a different output stream based on the content. For example, you could have a separate Firehose stream per delivery channel, with Kinesis Analytics queries choosing the destination.
You probably want to use an S3 event notification that gets fired each time Firehose places a new file in your S3 bucket (a PUT); the S3 event notification should call a custom lambda function that you write that reads the contents of the S3 file and splits it up and writes it out to the separate buckets, keeping in mind that each S3 file is likely going to contain many records, not just one.
https://aws.amazon.com/blogs/aws/s3-event-notification/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With