In the docs for DynamoDB it says:
In a Query operation, DynamoDB retrieves the items in sorted order, and then processes the items using
KeyConditionExpression
and anyFilterExpression
that might be present.
And:
A single Query operation can retrieve a maximum of 1 MB of data. This limit applies before any
FilterExpression
is applied to the results.
Does this mean, that KeyConditionExpression
is applied before this 1MB limit?
Limiting the number of items in the result set The Query operation allows you to limit the number of items that it reads. To do this, set the Limit parameter to the maximum number of items that you want. For example, suppose that you Query a table, with a Limit value of 6 , and without a filter expression.
DynamoDB can handle more than 10 trillion requests per day and can support peaks of more than 20 million requests per second.
The maximum provisioned throughput you can request is 10,000 write capacity unit and 10,000 read capacity unit for both auto scaling and manual throughput provisioning. If you want to exceed this limit then you have to contact Amazon before hand to get the access.
The size of a number is approximately (length of attribute name) + (1 byte per two significant digits) + (1 byte). A binary value must be encoded in base64 format before it can be sent to DynamoDB, but the value's raw byte length is used for calculating size.
Indeed, your interpretation is correct. With KeyConditionExpression
, DynamoDB can efficiently fetch only the data matching its criteria, and you only pay for this matching data and the 1MB read size applies to the matching data. But with FilterExpression
the story is different: DynamoDB has no efficient way of filtering out the non-matching items before actually fetching all of it then filtering out the items you don't want. So you pay for reading the entire unfiltered data (before FilterExpression
), and the 1MB maximum also corresponds to the unfiltered data.
If you're still unconvinced that this is the way it should be, here's another issue to consider: Imagine that you have 1 gigabyte of data in your database to be Scan'ed (or in a single key to be Query'ed), and after filtering, the result will be just 1 kilobyte. Were you to make this query and expect to get the 1 kilobyte back, Dynamo would need to read and process the entire 1 gigabyte of data before returning. This could take a very long time, and you would have no idea how much, and will likely timeout while waiting for the result. So instead, Dynamo makes sure to return to you after every 1MB of data it reads from disk (and for which you pay ;-)). Control will return to you 1000 (=1 gigabyte / 1 MB) times during the long query, and you won't have a chance to timeout. Whether a 1MB limit actually makes sense here or it should have been more, I don't know, and maybe we should have had a different limit for the response size and the read amount - but definitely some sort of limit was needed on the read amount, even if it doesn't translate to large responses.
By the way, the Scan
documentation includes a slightly differently-worded version of the explanation of the 1MB limit, maybe you will find it clearer than the version in the Query
documentation:
A single Scan operation will read up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data and then apply any filtering to the results using FilterExpression.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With