Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is shards in kinesis data stream

What is shards in kinesis data stream and partition key. I read aws documents but I don't get it. Can someone explain it in simple terms?

like image 751
Desp Avatar asked Jun 09 '19 13:06

Desp


People also ask

What is shard stream?

A shard is a uniquely identified sequence of data records in a stream. A stream is composed of one or more shards, each of which provides a fixed unit of capacity.

How many shards can a Kinesis stream have?

This means the message will be sent to any shard. So shards contains a set of data records. Maximum size of data record can be 1mb and shards can have max 1000 records.

What is shard iterator in Kinesis?

A shard iterator specifies the shard position from which to start reading data records sequentially. The position is specified using the sequence number of a data record in a shard.


Video Answer


1 Answers

From Amazon Kinesis Data Streams Terminology and Concepts - Amazon Kinesis Data Streams:

A shard is a uniquely identified sequence of data records in a stream. A stream is composed of one or more shards, each of which provides a fixed unit of capacity. Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second and up to 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys). The data capacity of your stream is a function of the number of shards that you specify for the stream. The total capacity of the stream is the sum of the capacities of its shards.

So, a shard has two purposes:

  • A certain amount of capacity/throughput
  • An ordered list of messages

If your application must process all messages in order, then you can only use one shard. Think of it as a line at a bank — if there is one line, then everybody gets served in order.

However, if messages only need to be ordered for a certain subset of messages, they can be sent to separate shards. For example, multiple lines in a bank, where each line gets served in order. Or, think of a bus sending GPS coordinates. Each bus sends messages to only a single shard. A shard might contain messages from multiple buses, but each bus only sends to one shard. This way, when the messages from that shard is processed, all messages from a particular bus are processed in order.

This is controlled by using a Partition Key, which identifies the source. The partition key is hashed and assigned to a shard. Thus, all messages with the same partition key will go to the same shard.

At the back-end, there is a typically one worker per shard that is processing the messages, in order, from that shard.

If your system does not care about preserving message order, then use a random partition key. This means the message will be sent to any shard.

like image 76
John Rotenstein Avatar answered Oct 14 '22 02:10

John Rotenstein