I think I understand the concept of not having hot hashKeys so that you use all the partitions in provisioning throughput. But do UUID hashKeys do a better job of distributing across the partitions than numerically sequenced ones? In both cases is a hashcode generated from the key and that value used to assign to a partition? If so, how do the hashcodes from two strings like: "100444" and "100445" differ? Are they close?
"100444" and "100445" are not any more likely to be in the same partition than a completely different number, like "12345" for example. Think of a DynamoDB table as a big hash table, where the hash key of the table is the key into the hash table. The underlying hash table is organized by the hash of the key, not by the key itself. You'll find that numbers and strings (UUIDs) both distribute fine in DynamoDB in terms of their distribution across partitions.
UUIDs are useful in DynamoDB because sequential numbers are difficult to generate in a scalable way for primary keys. Random numbers work well for primary keys, but sequential values are hard to generate without gaps and in a way that scales to the level of throughput that you can provision in a DynamoDB table. When you insert new items into a DynamoDB table, you can use conditional writes to ensure an item doesn't already exist with that primary key value.
(Note: this question is also cross-posted in this AWS Forums post and discussed there as well).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With