Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Kinesis - how to resume consuming from last checkpoint

I'm converting a Kafka consumer to an AWS Kinesis consumer, using the KCL (v2). In Kafka, offsets are used to help a consumer keep track of its most recently-consumed message. If my Kafka app dies, it will use the offset to consume from where it left off when it restarts.

However this isn't the same in Kinesis. I can set kinesisClientLibConfiguration.withInitialPositionInStream(...) but the only arguments for that are TRIM_HORIZON, LATEST or AT_TIMESTAMP. If my Kinesis app died, it would not know where to resume consuming from when it restarts.

My KCL consumer is very simple. The main() method looks like:

KinesisClientLibConfiguration config = new KinesisClientLibConfiguration("benTestApp",
            "testStream", new DefaultAWSCredentialsProviderChain(), UUID.randomUUID().toString());
config.withInitialPositionInStream(InitialPositionInStream.TRIM_HORIZON);

Worker worker = new Worker.Builder()
            .recordProcessorFactory(new KCLRecordProcessorFactory())
            .config(config)
            .build();

and the RecordProcessor is a simple implementation:

@Override
public void initialize(InitializationInput initializationInput) {
    LOGGER.info("Initializing record processor for shard: {}", initializationInput.getShardId());
}

@Override
public void processRecords(ProcessRecordsInput processRecordsInput) {
    List<Record> records = processRecordsInput.getRecords();
    LOGGER.info("Retrieved {} records", records.size());
    records.forEach(r -> LOGGER.info("Record: {}", StandardCharsets.UTF_8.decode(r.getData())));
}

@Override
public void shutdown(ShutdownInput shutdownInput) {
    LOGGER.info("Shutting down input");
}

If I check the corresponding DynamoDB table, the value of checkpoint is set as TRIM_HORIZON, and does not get updated with sequenceIds as records are consumed.

What's the solution here to ensure I consume every message?

like image 423
Ben Watson Avatar asked Jul 24 '18 08:07

Ben Watson


1 Answers

As identified by @kdgregory, the KCL requires users to set their own checkpoints. Working code:

@Override
public void initialize(InitializationInput initializationInput) {
    LOGGER.info("Initializing record processor for shard: {}", initializationInput.getShardId());
}

@Override
public void processRecords(ProcessRecordsInput processRecordsInput) {
    List<Record> records = processRecordsInput.getRecords();
    LOGGER.info("Retrieved {} records", records.size());
    records.forEach(r -> LOGGER.info("Record with sequenceId {} at date {} : {}", r.getSequenceNumber(),
            r.getApproximateArrivalTimestamp(), StandardCharsets.UTF_8.decode(r.getData())));
    try {
        processRecordsInput.getCheckpointer().checkpoint();
    } catch (InvalidStateException | ShutdownException e) {
        LOGGER.error("Unable to checkpoint");
    }
}

@Override
public void shutdown(ShutdownInput shutdownInput) {
    LOGGER.info("Shutting down input");
    try {
        shutdownInput.getCheckpointer().checkpoint();
    } catch (InvalidStateException | ShutdownException e) {
        LOGGER.error("Unable to checkpoint");
    }
}
like image 189
Ben Watson Avatar answered Oct 01 '22 20:10

Ben Watson