Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

binary vs. string vs. number for storing UUID in DynamoDB partition key?

I'm trying to decide whether to use binary, number, or string for my DynamoDB table's partition key. My application is a React.js/Node.js social event-management application where as much as half of the data volume stored in DynamoDB will be used to store relationships between Items and Attributes to other Items and Attributes. For example: friends of a user, attendees at an event, etc.

Because the schema is so key-heavy, and because the maximum DynamoDB Item size is only 400KB, and for perf & cost reasons, I'm concerned about keys taking up too much space. That said, I want to use UUIDs for partition keys. There are well-known reasons to prefer UUIDs (or something with similar levels of entropy and minimal chance of collisions) for distributed, serverless apps where multiple nodes are giving out new keys.

So, I think my choices are:

  1. Use a hex-encoded UUID (32 bytes stored after dashes are removed)
  2. Encode the UUID using base64 (22 bytes)
  3. Encode the UUID using z85 (20 bytes)
  4. Use a binary-typed attribute for the key (16 bytes)
  5. Use a number-typed attribute for the key (16-18 bytes?) - the Number type can only accommodate 127 bits, so I'd have to perform some tricks like stripping a version bit, but for my app that's probably OK. See How many bits of integer data can be stored in a DynamoDB attribute of type Number? for more info.

Obviously there's a tradeoff in developer experience. Using a hex string is the clearest but also the largest. Encoded strings are smaller but harder to deal with in logs, while debugging, etc. Binary and Number are harder than strings, but are the smallest.

I'm sure I'm not the first person to think about these tradeoffs. Is there a well-known best practice or heuristic to determine how UUID keys should be stored in DynamoDB?

If not, then I'm leaning towards using the Binary type, because it's the smallest storage and because its native representation (as a base64-encoded string) can be used everywhere humans need to view and reason about keys, including queries, logging, and client code. Other than having to transform it to/from a Buffer if I use DocumentClient, am I missing some problem with the Binary type or advantage of one of the other options in the list above?

If it matters, I'm planning for all access to DynamoDB to happen via a Lambda API, so even if there's conversion or marshalling required, that's OK because I can do it inside my API.

BTW, this question is a sequel to a 4-year-old question (UUID data type in DynamoDB) but 4 years is a looooooong time in a fast-evolving space, so I figured it was worth asking again.

like image 868
Justin Grant Avatar asked Oct 30 '18 04:10

Justin Grant


People also ask

What is DynamoDB number and string data type?

What is DynamoDB number and string data type storage space. For a simple primary key, the maximum length of the first attribute value (the partition key) is 2048 bytes. For a composite primary key, the maximum length of the second attribute value (the sort key) is 1024 bytes.

What is the size limit of binary data in DynamoDB?

Whenever DynamoDB compares binary values, it treats each byte of the binary data as unsigned. The length of a binary attribute can be zero, if the attribute is not used as a key for an index or table, and is constrained by the maximum DynamoDB item size limit of 400 KB.

What is partition key in DynamoDB?

What is a partition key? DynamoDB supports two types of primary keys: Partition key: A simple primary key, composed of one attribute known as the partition key. Attributes in DynamoDB are similar in many ways to fields or columns in other database systems.

What are the constraints for strings in DynamoDB?

Strings are Unicode with UTF-8 binary encoding. The length of a string must be greater than zero, and is constrained by the maximum DynamoDB item size limit of 400 KB. If you define a primary key attribute as a string type attribute, the following additional constraints apply:


1 Answers

I had a similar issue and concluded that the size of the key did not matter too much as all my options were going to be small and lightweight, with only minor tradeoffs. I decided that a programmer friendly way i.e. me would be to use the 'sub' that is the number created by cognito for each unique user. That way all the collision issues should they arise would also be taken care of by cognito. I could then encode or not encode. So howseover a user logs in, they will end up with the 'sub' then I match that with the records in the hash key of dynamodb and that immediately grants them fine-grained access to only their data. Three years later, I have found that to be a very reliable method.

like image 105
David White Avatar answered Oct 23 '22 01:10

David White