How to use ExplicitHashKey for round robin stream assignment in AWS Kinesis

Tags:

amazon-kcl

I am trying to pump lots of data through Amazon Kinesis (order 10,000 points per second).

In order to maximize records per second through my shards, I'd like to round robin my requests over the shards (my application logic doesn't care what shard individual messages go to).

It would seem I could do this with the ExplicitHashKey parameter for the messages in the list I am sending to the PutRecords endpoint - however the Amazon documentation doesn't actually describe how to use ExplicitHashKey, other than the oracular statement of:

http://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecords.html

Each record in the Records array may include an optional parameter, ExplicitHashKey, which overrides the partition key to shard mapping. This parameter allows a data producer to determine explicitly the shard where the record is stored. For more information, see Adding Multiple Records with PutRecords in the Amazon Kinesis Streams Developer Guide.

(The statement in the docs above has a link to another section of the documentation, which does not discuss ExplicitHashKeys at all).

Is there a way to use ExplicitHashKey to round robin data among shards?

What are valid values for the parameter?

753

asked Jun 16 '17 16:06

deadcode

1 Answers

Each shard is assigned a sequential range of 128 bit integers from 0 to 2^128 - 1.

You may find the range of integers assigned to a given shard in a stream via the AWS CLI:

aws kinesis describe-stream --stream-name name-of-your-stream

The output will look like:

{
    "StreamDescription": {
        "RetentionPeriodHours": 24, 
        "StreamStatus": "ACTIVE", 
        "StreamName": "name-of-your-stream", 
        "StreamARN": "arn:aws:kinesis:us-west-2:your-stream-info", 
        "Shards": [
           {
                "ShardId": "shardId-000000000113", 
                "HashKeyRange": {
                    "EndingHashKey": "14794885518301672324494548149207313541", 
                    "StartingHashKey": "0"
                }, 
                "ParentShardId": "shardId-000000000061", 
                "SequenceNumberRange": {
                    "StartingSequenceNumber": "49574208032121771421311268772132530603758174814974510866"
                }
            }, 
           { ... more shards ... }
       ...

You may set the ExplicitHashKey of a record to the string decimal representation of an integer value anywhere in the range of hash keys for a shard to force it to be sent to that particular shard.

Note that due to prior merge and split operations on your shard, there may be many shards with overlapping HashKeyRanges. The currently open shards are the ones that do not have a SequenceNumberRange.EndingSequenceNumber element.

You can round robin requests among a set of shards by identifying an 128 bit integer within the range of each of the shards of interest, and round robin assigning the string representation of that number to each record's ExplicitHashKey.

As a side note, you can also calculate the hash value a given PartitionKey will evaluate to by:

Compute the MD5 sum of the partition key.
Interpret the MD5 sum as a hexadecimal number and convert it to base 10. This will the the hash key for that partition key. You can then look up what shard that hash key falls into.

130

answered Jan 04 '23 04:01

deadcode

Related questions
                            
                                Kinesis stream / shard - multiple consumers
                            
                                Kinesis partition key falls always in the same shard
                            
                                How does kinesis firehose stream data to self managed elasticsearch?
                            
                                AWS Lambda execution duration randomly spikes and causes time-outs
                            
                                Is there any difference in processing times between AWS Kinesis Firehose and Streams?
                            
                                Spark Streaming Guarantee Specific Start Window Time
                            
                                AWS Kinesis .NET Consumer
                            
                                What exactly does sequenceNumberForOrdering do when putting records into a Kinesis stream with the Java SDK?
                            
                                Kinesis: What is the best/safe way to shutdown a worker?
                            
                                put_records() only accepts keyword arguments in Kinesis boto3 Python API
                            
                                If a AWS Lambda function has event sources from multiple Kinesis streams, will the batch of incoming records be from a single Kinesis stream or a mix?
                            
                                How to put data from server to Kinesis Stream
                            
                                AWS API Gateway Service Proxy to Kinesis Firehose

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use ExplicitHashKey for round robin stream assignment in AWS Kinesis

Tags:

amazon-kinesis

amazon-kcl

deadcode

People also ask

1 Answers

deadcode

Recent Activity

Donate For Us