Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple example of retrieving 500 items from dynamodb using Python

Looking for a simple example of retrieving 500 items from dynamodb minimizing the number of queries. I know there's a "multiget" function that would let me break this up into chunks of 50 queries, but not sure how to do this.

I'm starting with a list of 500 keys. I'm then thinking of writing a function that takes this list of keys, breaks it up into "chunks," retrieves the values, stitches them back together, and returns a dict of 500 key-value pairs.

Or is there a better way to do this?

As a corollary, how would I "sort" the items afterwards?

like image 346
ensnare Avatar asked Aug 25 '12 12:08

ensnare


People also ask

What is the maximum number of items that the BatchGetItem API retrieve from DynamoDB?

The BatchGetItem operation returns the attributes of one or more items from one or more tables. You identify requested items by primary key. A single operation can retrieve up to 16 MB of data, which can contain as many as 100 items.

Which of the following is the fastest way to get an item from DynamoDB?

GetItem – Retrieves a single item from a table. This is the most efficient way to read a single item because it provides direct access to the physical location of the item. (DynamoDB also provides the BatchGetItem operation, allowing you to perform up to 100 GetItem calls in a single operation.)


1 Answers

Depending on you scheme, There are 2 ways of efficiently retrieving your 500 items.

1 Items are under the same hash_key, using a range_key

  • Use the query method with the hash_key
  • you may ask to sort the range_keys A-Z or Z-A

2 Items are on "random" keys

  • You said it: use the BatchGetItem method
  • Good news: the limit is actually 100/request or 1MB max
  • you will have to sort the results on the Python side.

On the practical side, since you use Python, I highly recommend the Boto library for low-level access or dynamodb-mapper library for higher level access (Disclaimer: I am one of the core dev of dynamodb-mapper).

Sadly, neither of these library provides an easy way to wrap the batch_get operation. On the contrary, there is a generator for scan and for query which 'pretends' you get all in a single query.

In order to get optimal results with the batch query, I recommend this workflow:

  • submit a batch with all of your 500 items.
  • store the results in your dicts
  • re-submit with the UnprocessedKeys as many times as needed
  • sort the results on the python side

Quick example

I assume you have created a table "MyTable" with a single hash_key

import boto

# Helper function. This is more or less the code
# I added to devolop branch
def resubmit(batch, prev):
    # Empty (re-use) the batch
    del batch[:]

    # The batch answer contains the list of
    # unprocessed keys grouped by tables
    if 'UnprocessedKeys' in prev:
        unprocessed = res['UnprocessedKeys']
    else:
        return None

    # Load the unprocessed keys
    for table_name, table_req in unprocessed.iteritems():
        table_keys = table_req['Keys']
        table = batch.layer2.get_table(table_name)

        keys = []
        for key in table_keys:
            h = key['HashKeyElement']
            r = None
            if 'RangeKeyElement' in key:
                r = key['RangeKeyElement']
            keys.append((h, r))

        attributes_to_get = None
        if 'AttributesToGet' in table_req:
            attributes_to_get = table_req['AttributesToGet']

        batch.add_batch(table, keys, attributes_to_get=attributes_to_get)

    return batch.submit()

# Main
db = boto.connect_dynamodb()
table = db.get_table('MyTable')
batch = db.new_batch_list()

keys = range (100) # Get items from 0 to 99

batch.add_batch(table, keys)

res = batch.submit()

while res:
    print res # Do some usefull work here
    res = resubmit(batch, res)

# The END

EDIT:

I've added a resubmit() function to BatchList in Boto develop branch. It greatly simplifies the worklow:

  1. add all of your requested keys to BatchList
  2. submit()
  3. resubmit() as long as it does not return None.

this should be available in next release.

like image 131
yadutaf Avatar answered Oct 10 '22 20:10

yadutaf