How to use output of RedShift query as input of an EMR job?

1 Answers

This is pretty easy - no need for Sqoop. Add a Cascading Lingual step at the front of your job which executes a Redshift UNLOAD command to S3:

UNLOAD ('select_statement')
TO 's3://object_path_prefix'
[ WITH ] CREDENTIALS [AS] 'aws_access_credentials' 
[ option [ ... ] ]

Then you can either process the export directly on S3, or add an S3DistCp step to bring the data onto HDFS first.

This will be a lot more performant than adding Sqoop, and a lot simpler to maintain.

141

answered Oct 29 '22 16:10

Alex Dean

Related questions
                            
                                How can I get images and description from Amazon's Product Advertising API?
                            
                                How do you get Amazon SES working on Debian Squeeze?
                            
                                Is it possible to get TeamCity to stop & restart Amazon EC2 instances for build agents?
                            
                                How to install Railo on an Amazon Linux instance?
                            
                                Cannot telnet to port 25 on EC2 instance
                            
                                How does Instagram use Amazon S3?
                            
                                Is there a list of AMIs for every popular linux distribution and version
                            
                                Boto AWS Glacier - Retrieve archive
                            
                                How can I run Rails background jobs with Resque on AWS Elastic Beanstalk?
                            
                                Upload a file to S3 using the AWS SDK
                            
                                How does Instagram name files
                            
                                nginx large static file serving slow on AWS EBS-backed server
                            
                                How to fetch the CloudWatch metrics data for EC2 instances
                            
                                dynamodb getitem using php - I only want to retrieve the value
                            
                                What changes after AWS instance is stopped then restarted
                            
                                AWS EC2 : upgrade from AMI 1 to AMI 2
                            
                                How can you tell if an object is a folder on AWS S3
                            
                                -bash: aws: command not found
                            
                                Get URL(link) of a public S3 object programmatically
                            
                                Unmarshall DynamoDB JSON

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use output of RedShift query as input of an EMR job?

Tags:

amazon-web-services

amazon-redshift

amazon-emr

Dan Ciborowski - MSFT

People also ask

1 Answers

Alex Dean

Recent Activity

Donate For Us