Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Kinesis Firehose Buffering to S3

I'm attempting to price out a streaming data / analytic application deployed to AWS and looking at using Kinesis Firehose to dump the data into S3.

My question is, when pricing out the S3 costs for this, I need to figure out out how many PUT's I will need.

So, I know the Firehose buffers the data and then flushes out to S3, however I'm unclear on whether it will write a single "file" with all of the records accumulated up to that point or if it will write each record individually.

So, assuming I set the buffer size / interval to an optimal amount based on size of records, does the number of S3 PUT's still equal the number of records OR the number of flushes that the Firehose performs?

like image 616
Brooks Avatar asked Sep 27 '22 07:09

Brooks


2 Answers

Having read a substantial amount of AWS documentation, I respectfully disagree with the assertion that S3 will not charge you.

You will be billed separately for charges associated with Amazon S3 and Amazon Redshift usage including storage and read/write requests. However, you will not be billed for data transfer charges for the data that Amazon Kinesis Firehose loads into Amazon S3 and Amazon Redshift. For further details, see Amazon S3 pricing and Amazon Redshift pricing. [emphasis mine]

https://aws.amazon.com/kinesis/firehose/pricing/

What they are saying you will not be charged is anything additional by Kinesis Firehose for the transfers, other than the $0.035/GB, but you'll pay for the interactions with your bucket. (Data inbound to a bucket is always free of actual per-gigabyte transfer charges).

In the final analysis, though, you appear to be in control of the rough number of PUT requests against your bucket, based on some tunable parameters:

Q: What is buffer size and buffer interval?

Amazon Kinesis Firehose buffers incoming streaming data to a certain size or for a certain period of time before delivering it to destinations. You can configure buffer size and buffer interval while creating your delivery stream. Buffer size is in MBs and ranges from 1MB to 128MB. Buffer interval is in seconds and ranges from 60 seconds to 900 seconds.

https://aws.amazon.com/kinesis/firehose/faqs/#creating-delivery-streams

Unless it is collecting and aggregating the records into large files, I don't see why there would be a point in the buffer size and buffer interval... however, without firing up the service and taking it for a spin, I can (unfortunately) only really speculate.

like image 188
Michael - sqlbot Avatar answered Sep 30 '22 07:09

Michael - sqlbot


I don't believe you pay anything extra for the write operation to S3 from Firehose.

You will be billed separately for charges associated with Amazon S3 and Amazon Redshift usage including storage and read/write requests. However, you will not be billed for data transfer charges for the data that Amazon Kinesis Firehose loads into Amazon S3 and Amazon Redshift. For further details, see Amazon S3 pricing and Amazon Redshift pricing.

https://aws.amazon.com/kinesis/firehose/pricing/

like image 27
E.J. Brennan Avatar answered Sep 30 '22 07:09

E.J. Brennan