When splitting a shard into 2 child shards, the parent shard is shutdown. It is expecting that the record processor(KCL is being used here) would checkpoint when this happens as the following KCL source code shows:
try {
recordProcessor.shutdown(recordProcessorCheckpointer, reason);
String lastCheckpointValue = recordProcessorCheckpointer.getLastCheckpointValue();
if (reason == ShutdownReason.TERMINATE) {
if ((lastCheckpointValue == null)
|| (!lastCheckpointValue.equals(SentinelCheckpoint.SHARD_END.toString()))) {
throw new IllegalArgumentException("Application didn't checkpoint at end of shard "
+ shardInfo.getShardId());
}
}
The questions are:
Is this checkpoint indispensable?
What happens if the record processor does not checkpoint and absorbs the exception?
The reason I am asking is because in my use case I want to make sure that every record from the stream has been processed to s3, now if the shard is shutdown, there might be items which have not been flushed yet and therefore i want to make sure they would be resent to the new consumer/worker of the child-shard?
They wouldn't be resent if I checkpoint.
Any ideas?
Thx in advance.
A shard has a sequence of data records in a stream. It serves as a base throughput unit of a Kinesis data stream. A shard supports 1 MB/second and 1,000 records per second for writes and 2 MB/second for reads.
The throughput of an Amazon Kinesis data stream is designed to scale without limits. The default shard quota is 500 shards per stream for the following Amazon Web Services Regions: US East (N.
A single shard can ingest up to 1 MB of data per second (including partition keys) or 1,000 records per second for writes. Similarly, if you scale your stream to 5,000 shards, the stream can ingest up to 5 GB per second or 5 million records per second.
Shutting Down an Amazon Kinesis Data Streams Application When your Amazon Kinesis Data Streams application has completed its intended task, you should shut it down by terminating the EC2 instances on which it is running. You can terminate the instances using the AWS Management Console or the AWS CLI.
Items don't move between shards. After re-sharding, new records are put into new shards, but old records are never transferred from the parent shard, and no more new records are added to the (now closed) parent shard. Data persists in the parent shard for its normal 24 hour lifespan even after it is closed. Your record processor would only be shutdown after it has reached the end of the data from the parent shard.
http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-using-sdk-java-after-resharding.html
BTW, as you probably know the SDK API is difficult, and the client library isn't a whole lot better. Try the connector library, which is a much better API and includes an example of an S3 archiving application.
https://github.com/awslabs/amazon-kinesis-connectors
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With