Copy data from Amazon S3 to Redshift and avoid duplicate rows

Tags:

I am copying data from Amazon S3 to Redshift. During this process, I need to avoid the same files being loaded again. I don't have any unique constraints on my Redshift table. Is there a way to implement this using the copy command?

http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html

I tried adding unique constraint and setting column as primary key with no luck. Redshift does not seem to support unique/primary key constraints.

258

asked Mar 29 '13 10:03

Rups N

1 Answers

There's another solution to really avoid data duplication although it's not as straightforward as removing duplicated data once inserted. The copy command has the manifest option to specify which files you want to copy

copy customer
from 's3://mybucket/cust.manifest' 
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
manifest;

you can build a lambda that generates a new manifest file every time before you run the copy command. That lambda will compare the files already copied with the new files arrived and will create a new manifest with only the new files so that you will never ingest the same file twice

answered Sep 18 '22 14:09

Daniel

Related questions
                            
                                How to set multiple certificates for AWS::ElasticLoadBalancingV2::Listener
                            
                                Exposing to public more than 1 port with AWS ECS service and Elastic LoadBalancer
                            
                                Amazon SES SMTP Connection timed out
                            
                                How to update Amazon elasticbeanstalk CLI
                            
                                How to trigger a Lambda function at specific time in AWS?
                            
                                Elastic Beanstalk: can't find gem bundler (>= 0.a) with executable bundle (Gem::GemNotFoundException)
                            
                                How to access aws config file from WSL (Windows subsystem for Linux)?
                            
                                Getting SyntaxError: Unexpected token u in JSON at position 0 when running Lambda Test
                            
                                Amazon Web Services (AWS) Cognito error "Token is not from a supported provider of this identity pool."
                            
                                AWS EC2 FTP / HTML
                            
                                AWS CloudFront Issue for Custom Error File: AccessDenied Message
                            
                                Easiest way to get EC2 instance attributes within the instance itself
                            
                                Why is my Elastic Beanstalk application created through the CLI not showing up on the online AWS Elastic Beanstalk console?
                            
                                Django ALLOWED_HOSTS with ELB HealthCheck
                            
                                AWSS3 Region / plist configuration issue 'The service configuration is `nil`
                            
                                Kubernetes/kops: error attaching EBS volume to instance. You are not authorized to perform this operation. Error 403
                            
                                Fixing Amazon Cognito - Sign In With Apple - "Invalid State/RelayState provided"
                            
                                Increasing AWS EC2 ubuntu instance disk space
                            
                                Unable to access ElasticSearch AWS through Python
                            
                                How to create Cognito IdentityPool with Cognito UserPool as one of the Authentication provider using aws cdk?

Copy data from Amazon S3 to Redshift and avoid duplicate rows

Tags:

copy

duplicates

amazon-web-services

amazon-s3

amazon-redshift

Rups N

People also ask

1 Answers

Daniel

Recent Activity

Donate For Us