I am trying to write a huge number of records into a dynamoDB and I would like to know what is the correct way of doing that. Currently, I am using the DynamoDBMapper to do the job in a one batchWrite operation but after reading the documentation, I am not sure if this is the correct way (especially if there are some limits concerning the size and number of the written items).
Let's say, that I have an ArrayList with 10000 records and I am saving it like this:
mapper.batchWrite(recordsToSave, new ArrayList<BillingRecord>());
The first argument is the list with records to be written and the second one contains items to be deleted (no such items in this case).
Does the mapper split this write into multiple writes and handle the limits or should it be handled explicitly?
I have only found examples with batchWrite done with the AmazonDynamoDB client directly (like THIS one). Is using the client directly for the batch operations the correct way? If so, what is the point of having a mapper?
Item size. The maximum item size in DynamoDB is 400 KB, which includes both attribute name binary length (UTF-8 length) and attribute value lengths (again binary length). The attribute name counts towards the size limit.
DynamoDB increased the default quota for the number of DynamoDB tables you can create and manage per AWS account and AWS Region from 256 to 2,500 tables. DynamoDB also increased the number of table management operations you can perform concurrently from 50 to 500.
Fast and flexible NoSQL database service for any scale DynamoDB can handle more than 10 trillion requests per day and can support peaks of more than 20 million requests per second.
DynamoDB Throttling Each partition on a DynamoDB table is subject to a hard limit of 1,000 write capacity units and 3,000 read capacity units.
Does the mapper split your list of objects into multiple batches and then write each batch separately? Yes, it does batching for you and you can see that it splits the items to be written into batches of up to 25 items here. It then tries writing each batch and some of the items in each batch can fail. An example of a failure is given in the mapper documentation:
This method fails to save the batch if the size of an individual object in the batch exceeds 400 KB. For more information on batch restrictions see, http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html
The example is talking about the size of one record (one BillingRecord instance in your case) exceeding 400KB, which at the time of writing this answer, is the maximum size of a record in DynamoDB.
In the case a particular batch fails, it moves on to the next batch (sleeping the thread for a bit in case the failure was because of throttling). In the end, all of the failed batches are returned in List of FailedBatch instances. Each FailedBatch instance contains a list of unprocessed items that weren't written to DynamoDB.
Is the snippet that you provided the correct way for doing batch writes? I can think of two suggestions. The BatchSave method is more appropriate if you have no items to delete. You might also want to think about what you want to do with the failed batches.
Is using the client directly the correct way? If so, what is the point of the mapper? The mapper is simply a wrapper around the client. The mapper provides you an ORM layer to convert your BillingRecord instances into the sort-of nested hash maps that the low-level client works with. There is nothing wrong with using the client directly and this does tend to happen in some special cases where additional functionality needed needs to be coded outside of the mapper.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With