Copying only new records from AWS DynamoDB to AWS Redshift

Tags:

I see there is tons of examples and documentation to copy data from DynamoDB to Redshift, but we are looking at an incremental copy process where only the new rows are copied from DynamoDB to Redshift. We will run this copy process everyday, so there is no need to kill the entire redshift table each day. Does anybody have any experience or thoughts on this topic?

778

asked Jan 07 '14 19:01

Gowtham

2 Answers

Dynamo DB has a feature (currently in preview) called Streams:

Amazon DynamoDB Streams maintains a time ordered sequence of item level changes in any DynamoDB table in a log for a duration of 24 hours. Using the Streams APIs, developers can query the updates, receive the item level data before and after the changes, and use it to build creative extensions to their applications built on top of DynamoDB.

This feature will allow you to process new updates as they come in and do what you want with them, rather than design an exporting system on top of DynamoDB.

You can see more information about how the processing works in the Reading and Processing DynamoDB Streams documentation.

110

answered Sep 28 '22 06:09

mkobit

The copy from redshift can only copy the entire table. There are several ways to achieve this

Using an AWS EMR cluster and Hive - If you set up an EMR cluster then you can use Hive tables to execute queries on the dynamodb data and move to S3. Then that data can be easily moved to redshift.
You can store your dynamodb data based on access patterns (see http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.TimeSeriesDataAccessPatterns). If we store the data this way, then the dynamodb tables can be dropped after they are copied to redshift

answered Sep 28 '22 06:09

Gowtham

Related questions
                            
                                How to limit concurrency of a step in step functions
                            
                                Why does terraform fail with "An argument named "flow_log_destination_type" is not expected here"?
                            
                                Access Elastic Beanstalk environment properties in NGINX configs running on AWS Linux 2
                            
                                AWS Amplify CLI CloudFront S3 Restrictions
                            
                                IAM Permissions for a CI/CD Pipeline
                            
                                create or update dynamodb
                            
                                Terraform AWS : Couldn't reuse previously created root_block_device with AWS EC2 instance launched with aws_launch_configuration
                            
                                AWS S3 SDK V3 for Node.js - GetObjectCommand v/s getSignedUrl
                            
                                Lambda with SQSEvent & large batch size invokes multiple instances each handling few items
                            
                                s3-put fails to send file
                            
                                How can I script an alert for when my Amazon Web Service usage goes above a certain amount?
                            
                                Really slow AMI launches
                            
                                How to turn off the automatic proxy detection in the `AmazonS3` object?
                            
                                Amazon Product Advertising API - Item Not Accessible
                            
                                Using amazon web services as google app engine back end
                            
                                MySQL Full text search extremely slow on a AWS RDS large instance
                            
                                OpsWorks overriding database.yml / ignoring custom JSON
                            
                                Java sdk for copying to Redshift
                            
                                Performance of new Amazon SNS Mobile Push Service
                            
                                What does "Unable to find storage information for property" mean when using DynamoDB?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Copying only new records from AWS DynamoDB to AWS Redshift

Tags:

amazon-web-services

amazon-dynamodb

amazon-redshift

Gowtham

People also ask

2 Answers

mkobit

Gowtham

Recent Activity

Donate For Us