Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use output of RedShift query as input of an EMR job?

So my limited understand of Redshift this is my plan for going about my problem...

I want to take the results of a query, and use them as an input for an EMR job. What is the best way to go about this programmaticly.

Currently my EMR job takes a flat file from S3 as the input, and I use the Amazon Java SDK, to set this job up and everything.

Should I write the output of my RedShift query to S3, and point my EMR job there, and then remove the file after the EMR job has completed?

Or does the RedShift and AWS SKD offer a more resourceful way to directly pipe the query from RedShift to EMR, cutting out the the S3 step?

Thanks

Recently spoke with memebers of Amazon Redshift Team, they said a solution for this is in the works.

like image 771
Dan Ciborowski - MSFT Avatar asked Jul 17 '13 21:07

Dan Ciborowski - MSFT


People also ask

How do you load data into an EMR?

The most common way is to upload the data to Amazon S3 and use the built-in features of Amazon EMR to load the data onto your cluster. You can also use the DistributedCache feature of Hadoop to transfer files from a distributed file system to the local file system.

How do you use redshift query?

To use the query editor on the Amazon Redshift consoleOn the navigation menu, choose Query editor, then connect to a database in your cluster. For Schema, choose public to create a new table based on that schema. Enter the following in the query editor window and choose Run to create a new table.


1 Answers

This is pretty easy - no need for Sqoop. Add a Cascading Lingual step at the front of your job which executes a Redshift UNLOAD command to S3:

UNLOAD ('select_statement')
TO 's3://object_path_prefix'
[ WITH ] CREDENTIALS [AS] 'aws_access_credentials' 
[ option [ ... ] ]

Then you can either process the export directly on S3, or add an S3DistCp step to bring the data onto HDFS first.

This will be a lot more performant than adding Sqoop, and a lot simpler to maintain.

like image 141
Alex Dean Avatar answered Oct 29 '22 16:10

Alex Dean