I have a dynamodb table that stores historical run data for processes that run on my server, I need a place where I can aggregate these processes and see the data for the latest of each of these. Each process has it's own ProcessId
which is the partition key for the dynamodb table. The sort key is the StartDateTime
{
ProcessId, // Partition Key
StartDateTime, // Sort Key
... // More data
}
Essentially I need to retrieve the most recent StartDateTime for each ProcessId that I give. I'm using a nodejs lambda with the aws-sdk to retrieve the data. I've looked into using BatchGetItem but my understanding is that for tables with a Partition Key and Sort Key, you need to provide both to retrieve an item. I've also looked into using a Query, but I would need to run a separate query for each Partition which is less than Ideal. Does anyone know of a way I can make this request in one call rather than having to make a separate call per Partition?
You can Query any table or secondary index, provided that it has a composite primary key (partition key and sort key). Query operations consume read capacity units, as follows. The table's provisioned read capacity.
How many sort keys can DynamoDB have? There should only be one sort key defined per table. But, it can be composed using multiple columns.
The sort key of an item is also known as its range attribute. The term range attribute derives from the way DynamoDB stores items with the same partition key physically close together, in sorted order by the sort key value. Each primary key attribute must be a scalar (meaning that it can hold only a single value).
In Amazon DynamoDB, you can use either the DynamoDB API, or PartiQL, a SQL-compatible query language, to query an item from a table. With Amazon DynamoDB the Query action lets you retrieve data in a similar fashion. The Query action provides quick, efficient access to the physical locations where the data is stored.
To sum up what I understood from your post you may have data like this in your table:
PK (id) SK (timestamp) Other data
process1 1 ...
process2 4 ...
process1 8 ...
process3 18 ...
process2 25 ...
Your need is to easily retrieve:
process1 8 ...
process2 25 ...
process3 18 ...
As sandboxbohemian said, I suggest you a stream to trigger a lambda function each time a new input arrives. However, I would use the same table and upsert an item with the same id and a timestamp equal to 0. In addition I add a binary attribute "latest" with always set to "True" and a number attribute for the current timestamp. Chronologically the entries would be:
PK (id) SK (timestamp) Other data timestamp2(GSI SK) latest (GSI PK)
process1 1 ...
process1 0 ... 1 true
process2 4 ...
process2 0 ... 4 true
process1 8 ...
process1 0 ... 8
process3 18 ...
process3 0 ... 18 true
process2 25 ...
process2 0 ... 25 true
Then you have to create a GSI with PK equals to "latest" and SK equals to "timestamp" and project "id" and "data" attributes. It will be a sparse index meaning that only item with a latest attribute filled in will be present. Here after is the content:
latest (GSI PK) timestamp2 (GSI SK) id timestamp Data
true 8 process1 0 ...
true 25 process2 0 ...
true 18 process3 0 ...
As you see the the PK has always the same value. Therefore it allows doing a query or a scan. If you need all last process you can make a scan. If the number of process is really high you can make a query with latest=True and take advantage of sorting capabilities regarding timestamp2.
I agree this schema is not intuitive but it is often the case with dynamodb
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With