How to decide total number of partition keys in AWS kinesis stream?

Tags:

amazon-kinesis

In a producer-consumer web application, what should be the thought process to create a partition key for a kinesis stream shard. Suppose, I have a kinesis stream with 16 shards, how many partition keys should I create? Is it really dependent on the number of shards?

516

asked Jul 10 '15 19:07

shivba

1 Answers

Partition (or Hash) Key: starts from 1 up to 340282366920938463463374607431768211455. Lets say ~34020 * 10^34, I will omit 10^34 for ease...

If you have 30 shards, uniformly divided, each should cover 1134 * 10^34 hash keys. The coverage should be like this.

Shard-00: 0 - 1134 Shard-01: 1135 - 2268 Shard-03: 2269 - 3402 Shard-04: 3403 - 4536 ... Shard-28: 30619 - 31752 Shard-29: 31753 - 32886 Shard-30: 32887 - 34020

And if you have 3 consumer applications (listening to these 30 shards) each should listen 10 shards (optimum balanced).

This also explains Merge and Split operations on a Stream.

To merge 2 shards, they should cover adjacent hash keys. You cannot merge Shard-03 and Shard-29.
You can split any shard. If you split shard-00 in the middle, the distribution will like this;

Shard-31: 0 - 567 Shard-32: 568 - 1134 Shard-01: 1135 - 2268 Shard-03: 2269 - 3402 Shard-04: 3403 - 4536 ... Shard-28: 30619 - 31752 Shard-29: 31753 - 32886 Shard-30: 32887 - 34020

See, Shard-00 will no longer accept new data. The new records that are put in Kinesis stream with the same partition key range (as Shard-00) will be placed under Shard-31 or Shard-32.

While sending data to Kinesis (ie. producer side), you should not worry about "which shard the data goes to". Sending a random number (or uuid, or current timestamp in millis) would be best for scaling and distributing the data effectively on shards. Unless you are worried about the ordering of records in a single shard, it is best to choose a random number/constantly changing partition key for put_record request.

In Java you can use "putRecordsRequestEntry.setPartitionKey(Long.toString(System.currentTimeMillis()))" or "putRecordRequest.setPartitionKey(Long.toString(System.currentTimeMillis()))" can be examples.

128

answered Sep 19 '22 05:09

az3

Related questions
                            
                                Amazon Kinesis: Caught exception while sync'ing Kinesis shards and leases
                            
                                Using the partition key in Kinesis to guarantee that records with the same key are processed by the same record processor (lambda)
                            
                                Event Sourcing with Kinesis - Replaying and Persistence
                            
                                Explain Kinesis Shard Iterator - AWS Java SDK
                            
                                Boto3 Kinesis Video GetMedia and OpenCV
                            
                                How to build and use flink-connector-kinesis?
                            
                                How to deploy and Run Amazon Kinesis Application on Amazon Kinesis service
                            
                                Schema registry on AWS
                            
                                Partition Kinesis firehose S3 records by event time
                            
                                Terraform: Validation error ... Member must satisfy regular expression pattern: arn:aws:iam::
                            
                                What is shards in kinesis data stream
                            
                                AWS Lambda Limits when processing Kinesis Stream
                            
                                Processing DynamoDB streams using the AWS Java DynamoDB streams Kinesis adapter
                            
                                Best way to stream/logically replicate RDS Postgres data to kinesis
                            
                                Kinesis Firehose putting JSON objects in S3 without seperator comma
                            
                                Concatenate s3 files when using AWS Firehose
                            
                                Stream data from MySQL Binary Log to Kinesis
                            
                                Reliability issues with Checkpointing/WAL in Spark Streaming 1.6.0
                            
                                How can we use AWS Kinesis in a web browser?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With