Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why checkpoint is needed on an Amazon Kinesis stream when shutting down a shard?

When splitting a shard into 2 child shards, the parent shard is shutdown. It is expecting that the record processor(KCL is being used here) would checkpoint when this happens as the following KCL source code shows:

try {
                recordProcessor.shutdown(recordProcessorCheckpointer, reason);
                String lastCheckpointValue = recordProcessorCheckpointer.getLastCheckpointValue();
                if (reason == ShutdownReason.TERMINATE) {
                    if ((lastCheckpointValue == null)
                            || (!lastCheckpointValue.equals(SentinelCheckpoint.SHARD_END.toString()))) {
                        throw new IllegalArgumentException("Application didn't checkpoint at end of shard "
                                + shardInfo.getShardId());
                    }
                }

The questions are:

  • Is this checkpoint indispensable?

  • What happens if the record processor does not checkpoint and absorbs the exception?

The reason I am asking is because in my use case I want to make sure that every record from the stream has been processed to s3, now if the shard is shutdown, there might be items which have not been flushed yet and therefore i want to make sure they would be resent to the new consumer/worker of the child-shard?

They wouldn't be resent if I checkpoint.

Any ideas?

Thx in advance.

like image 441
isaac.hazan Avatar asked Mar 09 '15 18:03

isaac.hazan


People also ask

What is the relationship between a stream and shards within Amazon Kinesis?

A shard has a sequence of data records in a stream. It serves as a base throughput unit of a Kinesis data stream. A shard supports 1 MB/second and 1,000 records per second for writes and 2 MB/second for reads.

How many shards can a Kinesis stream have?

The throughput of an Amazon Kinesis data stream is designed to scale without limits. The default shard quota is 500 shards per stream for the following Amazon Web Services Regions: US East (N.

How much data can a single shard in Kinesis data stream handle?

A single shard can ingest up to 1 MB of data per second (including partition keys) or 1,000 records per second for writes. Similarly, if you scale your stream to 5,000 shards, the stream can ingest up to 5 GB per second or 5 million records per second.

How do I stop Kinesis streaming data?

Shutting Down an Amazon Kinesis Data Streams Application When your Amazon Kinesis Data Streams application has completed its intended task, you should shut it down by terminating the EC2 instances on which it is running. You can terminate the instances using the AWS Management Console or the AWS CLI.


1 Answers

Items don't move between shards. After re-sharding, new records are put into new shards, but old records are never transferred from the parent shard, and no more new records are added to the (now closed) parent shard. Data persists in the parent shard for its normal 24 hour lifespan even after it is closed. Your record processor would only be shutdown after it has reached the end of the data from the parent shard.

http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-using-sdk-java-after-resharding.html

BTW, as you probably know the SDK API is difficult, and the client library isn't a whole lot better. Try the connector library, which is a much better API and includes an example of an S3 archiving application.

https://github.com/awslabs/amazon-kinesis-connectors

like image 160
engineerC Avatar answered Sep 28 '22 18:09

engineerC