I don't get the concept of limits for query/scan in DynamoDb. According to the docs:
A single Query operation can retrieve a maximum of 1 MB of data.This limit applies before any FilterExpression is applied to the results.
Let's say I have 10k items, 250kb per item, all of them fit query params.
DynamoDB item size limit. The first important limit to know is the item size limit. An individual record in DynamoDB is called an item, and a single DynamoDB item cannot exceed 400KB. While 400KB is large enough for most normal database operations, it is significantly lower than the other options.
According to the documentation an "item" can have a maximum size of 400kB which severly limits the maximum number of log elements that can be stored.
Maximum length of 255. The condition that specifies the key values for items to be retrieved by the Query action. The condition must perform an equality test on a single partition key value. The condition can optionally perform one of several comparison tests on a single sort key value.
DynamoDB is a key-value and document database that can support tables of virtually any size with horizontal scaling. This enables DynamoDB to scale to more than ten trillion requests per day with peaks greater than 20 million requests per second, over petabytes of storage.
DynamoDB item size limit The first important limit to know is the item size limit. An individual record in DynamoDB is called an item, and a single DynamoDB item cannot exceed 400KB. While 400KB is large enough for most normal database operations, it is significantly lower than the other options.
In a Query operation, DynamoDB retrieves the items in sorted order, and then processes the items using KeyConditionExpression and any FilterExpression that might be present. Only then are the Query results sent back to the client. A Query operation always returns a result set. If no matching items are found, the result set is empty.
For items with a given partition key value, DynamoDB stores these items close together, in sorted order by sort key value. In a Query operation, DynamoDB retrieves the items in sorted order, and then processes the items using KeyConditionExpression and any FilterExpression that might be present.
The Query operation allows you to limit the number of items that it reads. To do this, set the Limit parameter to the maximum number of items that you want. For example, suppose that you Query a table, with a Limit value of 6, and without a filter expression.
If I run a simple query, I get only 4 items?
Yes
If I use ProjectionExpression to retrieve only single attribute (1kb in size), will I get 1k items?
No, filterexpressions and projectexpressions are applied after the query has completed. So you still get 4 items.
If I only need to count items (select: 'COUNT'), will it count all items (10k)?
No, still just 4
The thing that you are probably missing here is that you can still get all 10k results, or the 10k count, you just need to get the results in pages. Some details here. Basically when you complete your query, check the LastEvaluatedKey
attribute, and if its not empty, get the next set of results. Repeat this until the attribute is empty and you know you have all the results.
EDIT: I should say some of the SDKs abstract this away for you. For example the Java SDK has query
and queryPage
, where query
will go back to the server multiple times to get the full result set for you (i.e. in your case, give you the full 10k results).
For any operation that returns items, you can request a subset of attributes to retrieve; however, doing so has no impact on the item size calculations. In addition, Query and Scan can return item counts instead of attribute values. Getting the count of items uses the same quantity of read capacity units and is subject to the same item size calculations. This is because DynamoDB has to read each item in order to increment the count.
Managing Throughput Settings on Provisioned Tables
Great explanation by @f-so-k.
This is how I am handling the query.
import AWS from 'aws-sdk';
async function loopQuery(params) {
let keepGoing = true;
let result = null;
while (keepGoing) {
let newParams = params;
if (result && result.LastEvaluatedKey) {
newParams = {
...params,
ExclusiveStartKey: result.LastEvaluatedKey,
};
}
result = await AWS.query(newParams).promise();
if (result.count > 0 || !result.LastEvaluatedKey) {
keepGoing = false;
}
}
return result;
}
const params = {
TableName: user,
IndexName: 'userOrder',
KeyConditionExpression: 'un=:n',
ExpressionAttributeValues: {
':n': {
S: name,
},
},
ConsistentRead: false,
ReturnConsumedCapacity: 'NONE',
ProjectionExpression: ALL,
};
const result = await loopQuery(params);
Edit:
import AWS from 'aws-sdk';
async function loopQuery(params) {
let keepGoing = true;
let result = null;
let list = [];
while (keepGoing) {
let newParams = params;
if (result && result.LastEvaluatedKey) {
newParams = {
...params,
ExclusiveStartKey: result.LastEvaluatedKey,
};
}
result = await AWS.query(newParams).promise();
if (result.count > 0 || !result.LastEvaluatedKey) {
keepGoing = false;
list = [...list, ...result]
}
}
return list;
}
const params = {
TableName: user,
IndexName: 'userOrder',
KeyConditionExpression: 'un=:n',
ExpressionAttributeValues: {
':n': {
S: name,
},
},
ConsistentRead: false,
ReturnConsumedCapacity: 'NONE',
ProjectionExpression: ALL,
};
const result = await loopQuery(params);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With