Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I query DynamoDB when I want to consider the sort key but not the partition key?

I can't figure out how to do this in DynamoDB.

I have a table with data something like this:

ID    Updated     other fields...

1200  2017-12-11              ...
1201  2018-02-05              ...
1205  2018-01-05              ...
1206  2018-01-11              ...
1210  2018-02-15              ...
1212  2018-02-10              ...

The partition key is 'ID' and I have a sort key of 'Updated'.

I want to retrieve the records where Updated is greater than "2018-02-01", say.

I can't query on just 'Updated' alone, it complains with Query condition missed key schema element: ID. I understand what that means, but I'm not sure how to do this properly.

I've tried adding various indexes and then querying on the index, including having only the 'Updated' field as the partition key, but then I can't query for a range of values only an exact match on the partition key.

So, how do I query across multiple partitions for a condition?

I could use a scan, but that is potentially expensive. Can I do this by indexing it a certain way? Or is there a way to do something similar to a query where I don't need to specify the partition key?

like image 467
Octopus Avatar asked Mar 27 '18 21:03

Octopus


People also ask

Can you query just sort key DynamoDB?

Query by Sort Key Only You can not query only using a Sort Key. You need to specify a partition key to perform query operations. Else, you need to create a global secondary index or perform a scan operation.

Can you query DynamoDB without partition key?

Hash key in DynamoDB The primary reason for that complexity is that you cannot query DynamoDB without the hash key. So, it's not allowed to query the entire database. That means you cannot do what you would call a full table scan in other databases.

Is sort key optional in DynamoDB?

Each item in a DynamoDB table requires that you create a primary key for the table, as described in the DynamoDB documentation. A primary key can be a partition key or a combination of a partition key and sort key. The primary key must be unique across the table.

What is sort key and partition key in DynamoDB?

Partition key and sort key – Referred to as a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key. DynamoDB uses the partition key value as input to an internal hash function.


3 Answers

Use a scan

Almost everyone using DynamoDB seems to get worried about scans. Scans are FINE in many circumstances. Things you should ask yourself include; how much data will I have, how will it grow over time, how fast do I need the scan to complete, how many RCUs will this cost? Don't just dismiss scans - do the maths.

Archive data

If you only need to access recent data, consider deleting or archiving old data. By removing it from your table you can increase the performance of scans.

Partition by date

There are various strategies you can use to improve your table performance if you really want to use a query. For example you could have a partition key of YYYY-MM and sort key of datetime (down to nanosecond). That way you can retrieve whole months of data in one query, whilst still being able to sort for specific date ranges. This kind of query is much more complicated to handle in your application than a scan. Architecting your tables really depends on your data access patterns.

like image 200
F_SO_K Avatar answered Oct 06 '22 15:10

F_SO_K


Nice problem, not so nice solution! :)

• You cannot do a query without conditioning on Partition Key.
• You need the Updated column to be a Sorting Key, either in the table "schema", either in an index. If it will not be a sorting key anymore, you wont be able to efficiently query for Updated > VALUE.

So you need a constant partition key and Updated to be the sorting key. Here is your Global Secondary Index:
• PK: ConstantColumn
• SK: Updated

Of course, you'll loose some scalability because all your index will be in one partition, but using a KEYS_ONLY projection should give you enough room.

Should you really need more scalability consider having PK values like C0, C1, ..., Cn, iterate through queries for each partition key, then merge the results (divide et impera).

like image 32
Costin Avatar answered Oct 06 '22 14:10

Costin


I would consider alternative partition keys. For example, will your business logic work if you create a GSI with year as partition key and date as sort key? How about year-month?

Your query will be more complex to write as you might have to issue multiple queries to cover more than 1 partitions to fill your result page.

But as you pointed out, this is cheaper than performing a full table scan.

like image 23
gerrytan Avatar answered Oct 06 '22 13:10

gerrytan