We have been facing ProvisionedThroughputExceededException issue while writing data on Kinesis stream.
Case 1: We were using single m4.4xlarge (16 core, 64GB mem) instance to write data on stream pass 3k request from Jmeter, EC2 instance provides us 1100 request per second, So we chose 2 shard stream(i.e. 2000 eps). In result we were able to write data on stream successfully without any loss.
Case 2: For further testing we had created 10 EC2 m4.4xlarge (16 core, 64GB mem) cluster and 11 shard streams (based on simple calculation 1000eps for one shard, so 10 shard + 1 provision). When we tested that EC2 cluster with different request cases from Jmeter like 3, 10, 30 millions. We received ProvisionedThroughputExceededException error on our log file.
On Jmeter side EC2 cluster provides us 7500eps and i believe with 7500eps stream having 11000eps capacity should not return such error.
Could you help me to understand reason behind this issue.
It sounds like Kinesis is not hashing/distributing your data evenly across your shards - some are "hot" (getting the ProvisionedThroughputExceededException), while others are "cold".
To solve this, I recommend
ExplicitHashKey parameter in order to have control over which shards your data goes to. The PutRecords documentation has some basic info on this (but not as much as it should). The simplest pattern is just to have a single pre-defined ExplicitHashKey for each shard, and have your PutRecords logic just iterate through it for each record - perfectly even distribution. In any case, make sure your record hashing algorithm will distribute records evenly across the shards.
Another alternative/extension based on using ExplicitHashKey is to have a subset of your hashspace dedicated to "overflow" shard(s) - in your case, 1 specific ExplicitHashKey value mapped to one shard - when you start being throttled on your normal shards, send the records there for retry.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With