It's unclear to me, after reading the docs, how many read capacity units are consumed during a scan operation with a filter in DynamoDB. For example, with this ruby request:
table.items.where(:MyAttribute => "Some Value").each do |item_data|
# do something with the item_data
end
My understanding is that this will result in a table scan but DynamoDB will only return the items that I'm interested in. But if my table has 10000 items, and only 5 of those items are what gets through my filter, am I still being "charged" for a huge number of read capacity units?
The attribute I'm using for the filter is not a hash, range or secondary index. I've just had to add that attribute recently, and unexpectedly, which is why I'm not using a query instead.
1 read capacity unit (RCU) = 1 strongly consistent read of up to 4 KB/s = 2 eventually consistent reads of up to 4 KB/s per read. 2 RCUs = 1 transactional read request (one read per second) for items up to 4 KB. For reads on items greater than 4 KB, total number of reads required = (total item size / 4 KB) rounded up.
Should I use DynamoDB Scans? Generally speaking, no. Scans are expensive, slow, and against best practices. In order to fetch one item by key, you should use Get operation, and if you need to fetch a collection of items, you should do that using Query.
One write capacity unit represents one write per second for an item up to 1 KB in size. If you need to write an item that is larger than 1 KB, DynamoDB must consume additional write capacity units. Transactional write requests require 2 write capacity units to perform one write per second for items up to 1 KB.
A read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size. To learn more about DynamoDB read consistency models, see Read consistency. For example, suppose that you create a table with 10 provisioned read capacity units.
In short, you will be "charged" for the total amount of items scanned (not the total amount of items returned). Scan is, compared to query (as you already mentioned) an expensive operation.
Worth mentioning is the fact that when you invoke a scan on a table, it does not mean that the whole table will be scanned. If the size of the scanned items exceeds the limit of 1MB, the scan stops and you have to invoke it again to scan the next portion of the table.
This is taken from the official docs:
If the total number of scanned items exceeds the maximum data set size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation. The results also include the number of items exceeding the limit. A scan can result in no table data meeting the filter criteria.
The filter is applied after the scan on the found items so it does not affect the throughput capacity at all.
If you are going to be performing these operations regularly, it may be worth considering an addition of some secondary indexes or optimizing the hash and range keys.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With