DynamoDb: How to retrieve the first item (by sort key) for each of a given list of partition keys

Tags:

I have a dynamodb table that stores historical run data for processes that run on my server, I need a place where I can aggregate these processes and see the data for the latest of each of these. Each process has it's own ProcessId which is the partition key for the dynamodb table. The sort key is the StartDateTime

Click to copy

{
  ProcessId, // Partition Key
  StartDateTime, // Sort Key
  ... // More data
}

Essentially I need to retrieve the most recent StartDateTime for each ProcessId that I give. I'm using a nodejs lambda with the aws-sdk to retrieve the data. I've looked into using BatchGetItem but my understanding is that for tables with a Partition Key and Sort Key, you need to provide both to retrieve an item. I've also looked into using a Query, but I would need to run a separate query for each Partition which is less than Ideal. Does anyone know of a way I can make this request in one call rather than having to make a separate call per Partition?

254

asked Jan 10 '20 00:01

Luke

1 Answers

To sum up what I understood from your post you may have data like this in your table:

Click to copy

PK (id)         SK (timestamp)    Other data
process1        1                 ...
process2        4                 ...
process1        8                 ...
process3        18                ...
process2        25                ...

Your need is to easily retrieve:

Click to copy

process1        8                 ...
process2        25                ...
process3        18                ...

As sandboxbohemian said, I suggest you a stream to trigger a lambda function each time a new input arrives. However, I would use the same table and upsert an item with the same id and a timestamp equal to 0. In addition I add a binary attribute "latest" with always set to "True" and a number attribute for the current timestamp. Chronologically the entries would be:

Click to copy

PK (id)         SK (timestamp)    Other data      timestamp2(GSI SK)  latest (GSI PK)
process1        1                 ...                      
process1        0                 ...             1                   true
process2        4                 ...                      
process2        0                 ...             4                   true
process1        8                 ...                      
process1        0                 ...             8        
process3        18                ...                      
process3        0                 ...             18                  true       
process2        25                ...                      
process2        0                 ...             25                  true

Then you have to create a GSI with PK equals to "latest" and SK equals to "timestamp" and project "id" and "data" attributes. It will be a sparse index meaning that only item with a latest attribute filled in will be present. Here after is the content:

Click to copy

latest (GSI PK) timestamp2 (GSI SK)   id        timestamp   Data
true            8                     process1  0           ...
true            25                    process2  0           ...    
true            18                    process3  0           ...

As you see the the PK has always the same value. Therefore it allows doing a query or a scan. If you need all last process you can make a scan. If the number of process is really high you can make a query with latest=True and take advantage of sorting capabilities regarding timestamp2.

I agree this schema is not intuitive but it is often the case with dynamodb

149

answered Oct 05 '22 20:10

ben11

Related questions
                            
                                AWS CloudFormation Transform - How do I properly return an error message?
                            
                                How can I create a custom metric watching EFS metered size in AWS Cloudwatch?
                            
                                ALLOWED_HOSTS not working in my Django App deployed to Elastic Beanstalk
                            
                                aws network elb not generating logs
                            
                                How to install libcurl with nss backend in aws ec2? (Python 3.6 64bit Amazon Linux)
                            
                                AWS Cognito - Logging end user activities for auditing
                            
                                Presigned POST URLs work locally but not in Lambda
                            
                                Should I really use one DynamoDB table for all data?
                            
                                'no SavedModel bundles found!' on tensorflow_hub model deployment to AWS SageMaker
                            
                                Is there any mock(or local) service of aurora serverless Data Api?
                            
                                Spring Boot over HTTPS and SSL certificate on AWS
                            
                                AWS: instance metadata for iam is not found
                            
                                How to add an Internet Gateway to a VPC using AWS CDK?
                            
                                AWS Lambda using Node Js gives "connect ETIMEDOUT" on http.request()
                            
                                Getting error "Node is not supported" using aws amplify datastore on react native and expo
                            
                                boto3.exceptions.S3UploadFailedError: An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
                            
                                Terraform 0.12 aws_lambda_permission resource replaced every apply
                            
                                Amazon AWS Kinesis Video Boto GetMedia/PutMedia
                            
                                boto3 find object by metadata or tag
                            
                                How to get RDS instance hostname in CDK app?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

DynamoDb: How to retrieve the first item (by sort key) for each of a given list of partition keys

Tags:

amazon-web-services

amazon-dynamodb

aws-sdk

aws-sdk-nodejs

dynamodb-queries

Luke

People also ask

1 Answers

ben11

Recent Activity

Donate For Us