Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read capacity cost of a DynamoDB table scan

It's unclear to me, after reading the docs, how many read capacity units are consumed during a scan operation with a filter in DynamoDB. For example, with this ruby request:

table.items.where(:MyAttribute => "Some Value").each do |item_data|
   # do something with the item_data
end

My understanding is that this will result in a table scan but DynamoDB will only return the items that I'm interested in. But if my table has 10000 items, and only 5 of those items are what gets through my filter, am I still being "charged" for a huge number of read capacity units?

The attribute I'm using for the filter is not a hash, range or secondary index. I've just had to add that attribute recently, and unexpectedly, which is why I'm not using a query instead.

like image 487
RTF Avatar asked Jul 21 '15 09:07

RTF


People also ask

How is DynamoDB read capacity calculated?

1 read capacity unit (RCU) = 1 strongly consistent read of up to 4 KB/s = 2 eventually consistent reads of up to 4 KB/s per read. 2 RCUs = 1 transactional read request (one read per second) for items up to 4 KB. For reads on items greater than 4 KB, total number of reads required = (total item size / 4 KB) rounded up.

Are DynamoDB scans expensive?

Should I use DynamoDB Scans? Generally speaking, no. Scans are expensive, slow, and against best practices. In order to fetch one item by key, you should use Get operation, and if you need to fetch a collection of items, you should do that using Query.

How does DynamoDB determine read and write capacity?

One write capacity unit represents one write per second for an item up to 1 KB in size. If you need to write an item that is larger than 1 KB, DynamoDB must consume additional write capacity units. Transactional write requests require 2 write capacity units to perform one write per second for items up to 1 KB.

What is read capacity in DynamoDB?

A read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size. To learn more about DynamoDB read consistency models, see Read consistency. For example, suppose that you create a table with 10 provisioned read capacity units.


1 Answers

In short, you will be "charged" for the total amount of items scanned (not the total amount of items returned). Scan is, compared to query (as you already mentioned) an expensive operation.

Worth mentioning is the fact that when you invoke a scan on a table, it does not mean that the whole table will be scanned. If the size of the scanned items exceeds the limit of 1MB, the scan stops and you have to invoke it again to scan the next portion of the table.

This is taken from the official docs:

If the total number of scanned items exceeds the maximum data set size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation. The results also include the number of items exceeding the limit. A scan can result in no table data meeting the filter criteria.

The filter is applied after the scan on the found items so it does not affect the throughput capacity at all.

If you are going to be performing these operations regularly, it may be worth considering an addition of some secondary indexes or optimizing the hash and range keys.

like image 126
Smajl Avatar answered Sep 28 '22 09:09

Smajl