The package boto3
- Amazon's official AWS API wrapper for python - has great support for uploading items to DynamoDB in bulk. It looks like this:
db = boto3.resource("dynamodb", region_name = "my_region").Table("my_table")
with db.batch_writer() as batch:
for item in my_items:
batch.put_item(Item = item)
Here my_items
is a list of Python dictionaries each of which must have the table's primary key(s). The situation isn't perfect - for instance, there is no safety mechanism to prevent you from exceeding your throughput limits - but it's still pretty good.
However, there does not appear to be any counterpart for reading from the database. The closest I can find is DynamoDB.Client.batch_get_item()
, but here the API is extremely complicated. Here's what requesting two items looks like:
db_client = boto3.client("dynamodb", "my_region")
db_client.batch_get_item(
RequestItems = {
"my_table": {
"Keys": [
{"my_primary_key": {"S": "my_key1"}},
{"my_primary_key": {"S": "my_key2"}}
]
}
}
)
This might be tolerable, but the response has the same problem: all values are dictionaries whose keys are data types ("S"
for string, "N"
for number, "M"
for mapping, etc.) and it is more than a little annoying to have to parse everything. So my questions are:
Is there any native
boto3
support for batch reading from DynamoDb, similar to thebatch_writer
function above?
Failing that,
Does
boto3
provide any built-in way to automatically deserialize the responses to theDynamoDB.Client.batch_get_item()
function?
I'll also add that the function boto3.resource("dynamodb").Table().get_item()
has what I would consider to be the "correct" API, in that no type-parsing is necessary for inputs or outputs. So it seems that this is some sort of oversight by the developers, and I suppose I'm looking for a workaround.
Amazon DynamoDB provides low-level API actions for managing database tables and indexes, and for creating, reading, updating and deleting data. DynamoDB also provides API actions for accessing and processing stream records. This API Reference describes the low-level API for Amazon DynamoDB.
A bulk (batch) update refers to updating multiple rows belonging to a single table. However, DynamoDB does not provide the support for this.
The BatchGetItem operation returns the attributes of one or more items from one or more tables. You identify requested items by primary key. A single operation can retrieve up to 16 MB of data, which can contain as many as 100 items.
So thankfully there is something that you might find useful - much like the json
module which has json.dumps
and json.loads
, boto3 has a types module that includes a serializer and deserializer. See TypeSerializer/TypeDeserializer. If you look at the source code, the serialization/deserialization is recursive and should be perfect for your use case.
Note: Its recommended that you use Binary
/Decimal
instead of just using a regular old python float/int for round trip conversions.
serializer = TypeSerializer()
serializer.serialize('awesome') # returns {'S' : 'awesome' }
deser = TypeDeserializer()
deser.deserialize({'S' : 'awesome'}) # returns u'awesome'
Hopefully this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With