OK, I'll start with an elaborated use-case and will explain my question:
TRIM_HORIZON
iterator type;GetShardIteratorRequest
class; My problem is that the data I retrieve is inconsistent and has no chronological logic in it.
When I use AT_SEQUENCE_NUMBER
and provide the first sequence number from the shard with
.getSequenceNumberRange().getStartingSequenceNumber();
... as the ``, I'm not getting all records. Similarly, AFTER_SEQUENCE_NUMBER
;
LATEST
, I'm getting zero results;TRIM_HORIZON
, which should make sense to use, it doesn't seem to be working fine. It used to provide me the data, and then I've added new "events" (records to the final stream) and I received zero records. Mystery.My questions are:
ShardIteratorRequest
? TRIM_HORIZON
method?Thanks in advance, I'd really love to learn a bit more about data consumption from a Kinesis stream.
I understand the confusion above, and I had the same issues, but I think I've figured it out now. Note that I am using the JSON API directly without KCL.
I seems that the API gives clients 2 basic choices of iterators when they begin consuming a stream :
A) TRIM_HORIZON: for reading PAST records delayed between many minutes (even hours) and 24 hours old. It doesn't return recently put records. Using AFTER_SEQUENCE_NUMBER on the last record seen by this iterator returns an empty array even when records have been recently PUT.
B) LATEST: for reading FUTURE records in real time (immediately after they are PUT). I was tricked by the only sentence of documentation I could find on this "Start reading just after the most recent record in the shard, so that you always read the most recent data in the shard." You were getting an empty array because no records had been PUT since getting the iterator. If you get this type of iterator, and then PUT a record, that record will be immediately available.
Lastly, if you know the sequence id of a recently put record, you can get it immediately using AT_SEQUENCE_NUMBER, and you can get later records using AFTER_SEQUENCE_NUMBER even though they wont appear to a TRIM_HORIZON iterator.
The above does mean that if you want to read all known past records and future records in real time, you have to use a combination of A and B, with logic to cope with the records in between (the recent past). The KCL may well smooth over this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With