How do DynamoDB streams distribute records to shards?

Tags:

My goal is to ensure that records published by a DynamoDB stream are processed in the "correct" order. My table contains events for customers. Hash key is Event ID, range key a timestamp. "Correct" order would mean that events for the same customer ID are processed in order. Different customer IDs can be processed in parallel.

I'm consuming the stream via Lambda functions. Consumers are spawned automatically per shard. So if the runtime decides to shard the stream, consumption happens in parallel (if I get this right) and I run the risk of processing a CustomerAddressChanged event before CustomerCreated (for example).

The docs imply that there is no way to influence the sharding. But they don't say so explicitly. Is there a way, e.g., by using a combination of customer ID and timestamp for the range key?

917

asked May 30 '17 15:05

EagleBeak

2 Answers

The assumption that sharding is determined by table keys seems to be correct. My solution will be to use customer ID as hash key and timestamp (or event ID) as range key.

This AWS blog says:

The relative ordering of a sequence of changes made to a single primary key will be preserved within a shard. Further, a given key will be present in at most one of a set of sibling shards that are active at a given point in time. As a result, your code can simply process the stream records within a shard in order to accurately track changes to an item.

This slide confirms it. I still wish the DynamoDB docs would explicitly say so...

131

answered Sep 21 '22 13:09

EagleBeak

I just had a response from AWS support. It seems to confirm @EagleBeak assumptions about partitions being mapped into shards. Or as I understand it, a partition is mapped to a shard tree.

My question was about REMOVE events due to TTL expiration, but it would apply to all other types of actions too.

Is a shard created per Primary Partition Key? and then if there are too many items in the same partition, the shard gets split into children?

A shard is created per partition in your DynamoDB table. If a partition split is required due to too many items in the same partition, the shard gets split into children as well. A shard might split in response to high levels of write activity on its parent table, so that applications can process records from multiple shards in parallel.

https://aws.amazon.com/blogs/database/dynamodb-streams-use-cases-and-design-patterns/

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html

Will those removed 100 items be put in just one shard provided they all have the same partition key?

Assuming all 100 items have the same partition key value (but different sort key values), they would have been stored on the same partition. Therefore, they would be removed from the same partition and be put in the same shard.

Since "records sent to your AWS Lambda function are strictly serialized", how does this serialisation work in the case of TTL? Is order within a shard established by partition/sort keys, TTL expiration, etc.?

DynamoDB Streams captures a time-ordered sequence of item-level modifications in your DynamoDB table. This time-ordered sequence is preserved at a per shard level. In other words, the order within a shard is established based on the order in which items were created, updated or deleted.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html

answered Sep 18 '22 13:09

cortopy

Related questions
                            
                                Dynamic environment variables for AWS Lambda using cloudformation template
                            
                                Retrieve correct Amazon attached EBS device from instance metadata endpoint
                            
                                What is the equivalent of AWS elastic beanstalk in GCP?
                            
                                How to use webpack import aws-sdk
                            
                                Can't close ElasticSearch index on AWS?
                            
                                docker-machine connect to existing machine
                            
                                How to accept the Free form text as input to Amazon Skill Kit?
                            
                                Assign Lambda Function To Specific VPC ID in serverless.yml
                            
                                Cancellation token in Lambda Function Handler C#
                            
                                AWS Cloudformation Role is not authorized to perform AssumeRole on Role
                            
                                How can I serve assets in /public that are not part of the asset pipeline with puma/nginx?
                            
                                How to change character_set_server in Amazon's Aurora DB?
                            
                                How to set multiline RSA private key environment variable for AWS Elastic Beans
                            
                                Querying nested attributes in Amazon DynamoDB
                            
                                Configuring any CDN to deliver only one file no matter what url has been requested
                            
                                aws CloudFormation AWS::EC2::Instance BlockDeviceMappings and Volumes
                            
                                hosting nodeJS/mongoose web application on amazon EC2
                            
                                Deploying Symfony2 Application to AWS Elastic Beanstalk - Post Deployment Cache Clearing
                            
                                Can I delete data records or shards from amazon Kinesis without deleting stream?
                            
                                Monitoring memory usage in AWS CloudWatch for Windows instance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do DynamoDB streams distribute records to shards?

Tags:

amazon-web-services

aws-lambda

amazon-dynamodb

amazon-dynamodb-streams

EagleBeak

People also ask

2 Answers

EagleBeak

cortopy

Recent Activity

Donate For Us