Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What exactly does sequenceNumberForOrdering do when putting records into a Kinesis stream with the Java SDK?

I'm a bit confused about the AWS docs for putting records to Kinesis stream here: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecord.html#API_PutRecord_RequestSyntax

It says that setting sequenceNumberForOrdering should be used for guaranteeing order "for puts from the same client and to the same partition key".

The example at the very bottom of this page is what confuses me:

  1. What should be the initial value of the variable sequenceNumberOfPreviousRecordin that example? "0"?
  2. Why does it not seem to matter for which partition key the previous record was put? (The loop in the example puts records for two different partition keys, 0 and 1.)

Maybe I just don't get it, but I think the docs could do a better job of explaining this.

like image 205
EagleBeak Avatar asked Jul 24 '17 14:07

EagleBeak


1 Answers

That's a strangely incomplete example. Doesn't show or discuss how sequenceNumberOfPreviousRecordin is initialized. I found a slightly better example in aws forums, apparently the starting sequence number to use is null.

String sequenceNumberOfPreviousRecord = null;
for (int j = 0; j < 200; j++) {
  PutRecordRequest putRecordRequest = new PutRecordRequest();
  putRecordRequest.setStreamName(myStreamName);
  putRecordRequest.setData(ByteBuffer.wrap(String.format("%s-%d",testData, 200+j).getBytes()));
  putRecordRequest.setPartitionKey( String.format( "partitionKey-%d", j/5 )); 
  putRecordRequest.setSequenceNumberForOrdering( sequenceNumberOfPreviousRecord );
  PutRecordResult putRecordResult = kinesisClient.putRecord(putRecordRequest);
  sequenceNumberOfPreviousRecord = putRecordResult.getSequenceNumber();

  System.out.println("Successfully putrecord, partition key : " + putRecordRequest.getPartitionKey()
      + ", Data : " + String.format("%s-%d",testData, 200+j)
      + ", SequenceNumber : " + putRecordResult.getSequenceNumber()
      );
}

Your example's use of partition key is weird as well. Unless if there is a very skewed distribution of partition keys, keys 0 and 1 are very likely to end up in the same shard. In most cases you're best served using a random uuid to ensure distributing incoming records across your shards.

like image 63
RaGe Avatar answered Oct 26 '22 23:10

RaGe