I want to put a large number of items into dynamodb (probably around 100k per day. But this could scale upwards in the future).
A small percentage of these will get a lot more hits than the others (not sure on the exact figure, say 2%-5%). I won't be able to determine which in advance.
The hashkey for each is simply a unique positive integer (item_id). And I need the range key to be a unixtime stamp.
The problem is, will I run into a hot key situation with this set up? I'm not sure if partitions are created for every single hashkey value? Or are hashkeys randomly put into different partitions?
If it's the latter I should be safe because the items with more hits will be randomly distributed across the partitions. But if it's the former then some partitions will get a lot more hits than others
Don't be discouraged, no DynamoDB table has the perfectly distributed access patterns like the documentation suggests. You'll have some hot spots, it's normal and OK. You may have to increase your read/write throughput to accommodate the hotspots, and depending on how hot they are that might make a difference in the costs. But at the modest throughput levels you describe, it isn't going to make DynamoDB unusable or anything.
I recommend converting your capacity requirements into the per-second throughput metrics DynamoDB uses. Will the 100,000 per day really be evenly distributed to ~2 per second?
Yes, the hash keys will be distributed across partitions. Partitions do not correspond to individual items, but to allocations of read/write capacity and storage (Understanding Partition Behavior).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With