DynamoDB - How to do incremental backup?

Tags:

I am using DynamoDB tables with keys and throughput optimized for application use cases. To support other ad hoc administrative and reporting use cases I want to keep a complete backup in S3 (a day old backup is OK). Again, I cannot afford to scan the entire DynamoDB tables to do the backup. The keys I have are not sufficient to find out what is "new". How do I do incremental backups? Do I have to modify my DynamoDB schema, or add extra tables just to do this? Any best practices?

Update: DynamoDB Streams solves this problem.

DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table, and stores this information in a log for up to 24 hours. Applications can access this log and view the data items as they appeared before and after they were modified, in near real time.

825

asked Apr 01 '13 12:04

Nizam Mohideen

1 Answers

I see two options:

Generate the current snapshot. You'll have to read from the table to do this, which you can do at a very slow rate to stay under your capacity limits (Scan operation). Then, keep an in-memory list of updates performed for some period of time. You could put these in another table, but you'll have to read those, too, which would probably cost just as much. This time interval could be a minute, 10 minutes, an hour, whatever you're comfortable losing if your application exits. Then, periodically grab your snapshot from S3, replay these changes on the snapshot, and upload your new snapshot. I don't know how large your data set is, so this may not be practical, but I've seen this done with great success for data sets up to 1-2GB.
Add read throughput and backup your data using a full scan every day. You say you can't afford it, but it isn't clear if you mean paying for capacity, or that the scan would use up all the capacity and the application would begin failing. The only way to pull data out of DynamoDB is to read it, either strongly or eventually consistent. If the backup is part of your business requirements, then I think you have to determine if it's worth it. You can self-throttle your read by examining the ConsumedCapacityUnits property on your results. The Scan operation has a Limit property that you can use to limit the amount of data read in each operation. Scan also uses eventually consistent reads, which are half the price of strongly consistent reads.

101

answered Oct 14 '22 04:10

Steven Hood

Related questions
                            
                                Transfer wiki between different MediaWiki versions manually by copying files
                            
                                Microsoft SQL Server backup physical_device_name
                            
                                Ruby Backup gem failing when uploading to S3. connection reset after 37 min
                            
                                How do you backup an apache Jackrabbit repository without shutting Jackrabbit down?
                            
                                Download all objects from a S3 bucket, including glacier restored
                            
                                How to back up and restore apps on ios development devices?
                            
                                Is there (an automated) way to backup Hudson CI files?
                            
                                404 when trying to create backup in a Google App Engine project
                            
                                What is the best way to backup a django project?
                            
                                Raven DB: How is 'smuggler' different from 'Import/Export'?
                            
                                Can't import to heroku postgres database from dump
                            
                                How to take backup of a database which has table size 105 GB?
                            
                                Is Rsyncing git repo good enough backup solution?
                            
                                How to version control data stored in mysql
                            
                                Git bare - what to backup?
                            
                                Best practices for cleaning up Cassandra incremental backup folders
                            
                                Activity crash lifecycle method - android
                            
                                Taking periodic backups of a MySQL database
                            
                                Simple way to backup event log on Windows Server
                            
                                Android 2.2 Data Backup: How to backup DefaultSharedPreferences?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

DynamoDB - How to do incremental backup?

Tags:

backup

amazon-dynamodb

Nizam Mohideen

People also ask

1 Answers

Steven Hood

Recent Activity

Donate For Us