Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DynamoDB: When does 1MB limit for queries apply

In the docs for DynamoDB it says:

In a Query operation, DynamoDB retrieves the items in sorted order, and then processes the items using KeyConditionExpression and any FilterExpression that might be present.

And:

A single Query operation can retrieve a maximum of 1 MB of data. This limit applies before any FilterExpression is applied to the results.

Does this mean, that KeyConditionExpression is applied before this 1MB limit?

like image 672
J. Hesters Avatar asked Jun 17 '19 12:06

J. Hesters


People also ask

How does limit work DynamoDB?

Limiting the number of items in the result set The Query operation allows you to limit the number of items that it reads. To do this, set the Limit parameter to the maximum number of items that you want. For example, suppose that you Query a table, with a Limit value of 6 , and without a filter expression.

How many requests could be handled by DynamoDB in a single day?

DynamoDB can handle more than 10 trillion requests per day and can support peaks of more than 20 million requests per second.

Is there a limit to how much throughput you can get out of a single table in DynamoDB?

The maximum provisioned throughput you can request is 10,000 write capacity unit and 10,000 read capacity unit for both auto scaling and manual throughput provisioning. If you want to exceed this limit then you have to contact Amazon before hand to get the access.

How does DynamoDB calculate item size?

The size of a number is approximately (length of attribute name) + (1 byte per two significant digits) + (1 byte). A binary value must be encoded in base64 format before it can be sent to DynamoDB, but the value's raw byte length is used for calculating size.


1 Answers

Indeed, your interpretation is correct. With KeyConditionExpression, DynamoDB can efficiently fetch only the data matching its criteria, and you only pay for this matching data and the 1MB read size applies to the matching data. But with FilterExpression the story is different: DynamoDB has no efficient way of filtering out the non-matching items before actually fetching all of it then filtering out the items you don't want. So you pay for reading the entire unfiltered data (before FilterExpression), and the 1MB maximum also corresponds to the unfiltered data.

If you're still unconvinced that this is the way it should be, here's another issue to consider: Imagine that you have 1 gigabyte of data in your database to be Scan'ed (or in a single key to be Query'ed), and after filtering, the result will be just 1 kilobyte. Were you to make this query and expect to get the 1 kilobyte back, Dynamo would need to read and process the entire 1 gigabyte of data before returning. This could take a very long time, and you would have no idea how much, and will likely timeout while waiting for the result. So instead, Dynamo makes sure to return to you after every 1MB of data it reads from disk (and for which you pay ;-)). Control will return to you 1000 (=1 gigabyte / 1 MB) times during the long query, and you won't have a chance to timeout. Whether a 1MB limit actually makes sense here or it should have been more, I don't know, and maybe we should have had a different limit for the response size and the read amount - but definitely some sort of limit was needed on the read amount, even if it doesn't translate to large responses.

By the way, the Scan documentation includes a slightly differently-worded version of the explanation of the 1MB limit, maybe you will find it clearer than the version in the Query documentation:

A single Scan operation will read up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data and then apply any filtering to the results using FilterExpression.

like image 109
Nadav Har'El Avatar answered Sep 22 '22 19:09

Nadav Har'El