Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When I do a scan() in Dynamodb with no filter and retrieve only 10 objects, does it still access the entire database?

For example (using Boto):

import boto

db = boto.connect_dynamodb()

table = db.get_table('MyTable')
res = table.scan(attributes_to_get=['id'], max_results=10)

for i in res:
    print i

If I have 1,000 objects in my table, will it scan all of them, or stop after 10? If this does indeed read all 1,000 objects, how can I have it read only the first 10?

like image 228
ensnare Avatar asked Oct 07 '22 19:10

ensnare


1 Answers

According to the documentation on capacity unit calculation, only up to 1MB of data will be analyzed per single request.

In case of a scan operation, it is not the size of items returned by scan, rather it is the size of items evaluated by Amazon DynamoDB. That is, for a scan request, Amazon DynamoDB evaluates up to 1 MB of items and returns only the items that satisfy the scan condition.

For tables with 'only' 1,000 items it would theoretically parse all the table each time. Hopefully, the 'limit' parameter (of which maximal value is 100), will allow to stop the process earlier so that at most limit items are returned.

If you request does not involve any conditions, the scanned items count will be the number of results. Otherwise, it might be much, much greater but the cumulated size of scanned items can not cross the 1MB boundary.

For scanned operations, Amazon will considered you consumed

consumed_capacity = math.ceil(sum(parsed_data_size)/1KB)

But please, don't take my word on it:

import boto
db = boto.connect_dynamodb()

# Notice the "layer1" operation
res = db.layer1.scan('MyTable', attributes_to_get=['id'], limit=10)

print res['ScannedCount']
like image 73
yadutaf Avatar answered Oct 14 '22 02:10

yadutaf