I'm converting a Kafka consumer to an AWS Kinesis consumer, using the KCL (v2). In Kafka, offsets are used to help a consumer keep track of its most recently-consumed message. If my Kafka app dies, it will use the offset to consume from where it left off when it restarts.
However this isn't the same in Kinesis. I can set kinesisClientLibConfiguration.withInitialPositionInStream(...)
but the only arguments for that are TRIM_HORIZON
, LATEST
or AT_TIMESTAMP
. If my Kinesis app died, it would not know where to resume consuming from when it restarts.
My KCL consumer is very simple. The main()
method looks like:
KinesisClientLibConfiguration config = new KinesisClientLibConfiguration("benTestApp",
"testStream", new DefaultAWSCredentialsProviderChain(), UUID.randomUUID().toString());
config.withInitialPositionInStream(InitialPositionInStream.TRIM_HORIZON);
Worker worker = new Worker.Builder()
.recordProcessorFactory(new KCLRecordProcessorFactory())
.config(config)
.build();
and the RecordProcessor
is a simple implementation:
@Override
public void initialize(InitializationInput initializationInput) {
LOGGER.info("Initializing record processor for shard: {}", initializationInput.getShardId());
}
@Override
public void processRecords(ProcessRecordsInput processRecordsInput) {
List<Record> records = processRecordsInput.getRecords();
LOGGER.info("Retrieved {} records", records.size());
records.forEach(r -> LOGGER.info("Record: {}", StandardCharsets.UTF_8.decode(r.getData())));
}
@Override
public void shutdown(ShutdownInput shutdownInput) {
LOGGER.info("Shutting down input");
}
If I check the corresponding DynamoDB table, the value of checkpoint
is set as TRIM_HORIZON
, and does not get updated with sequenceIds as records are consumed.
What's the solution here to ensure I consume every message?
As identified by @kdgregory, the KCL requires users to set their own checkpoints. Working code:
@Override
public void initialize(InitializationInput initializationInput) {
LOGGER.info("Initializing record processor for shard: {}", initializationInput.getShardId());
}
@Override
public void processRecords(ProcessRecordsInput processRecordsInput) {
List<Record> records = processRecordsInput.getRecords();
LOGGER.info("Retrieved {} records", records.size());
records.forEach(r -> LOGGER.info("Record with sequenceId {} at date {} : {}", r.getSequenceNumber(),
r.getApproximateArrivalTimestamp(), StandardCharsets.UTF_8.decode(r.getData())));
try {
processRecordsInput.getCheckpointer().checkpoint();
} catch (InvalidStateException | ShutdownException e) {
LOGGER.error("Unable to checkpoint");
}
}
@Override
public void shutdown(ShutdownInput shutdownInput) {
LOGGER.info("Shutting down input");
try {
shutdownInput.getCheckpointer().checkpoint();
} catch (InvalidStateException | ShutdownException e) {
LOGGER.error("Unable to checkpoint");
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With