I'm using AWS data pipeline service to pipe data from a RDS MySql
database to s3
and then on to Redshift
, which works nicely.
However, I also have data living in an RDS Postres
instance which I would like to pipe the same way but I'm having a hard time setting up the jdbc-connection. If this is unsupported, is there a work-around?
"connectionString": "jdbc:postgresql://THE_RDS_INSTANCE:5432/THE_DB”
AWS RDS Postgres Export to S3: Exporting Data to Amazon S3 To start the process of AWS RDS Postgres Export to S3 you need to export the data stored in an RDS instance for your PostgreSQL database to an Amazon S3 bucket, you first need to ensure that your RDS instance version for PostgreSQL supports Amazon S3 exports.
You can define a copy-activity in the Data Pipeline interface to extract data from a Postgres RDS instance into S3. Create a data node of the type SqlDataNode. Specify table name and select query.
I wish AWS extends COPY command in RDS Postgresql as they did in Redshift. But for now they haven't and we have to do it by ourselves. Install awscli on your EC2 box (it might have been installed by default) Use aws s3 sync or aws s3 cp commmands to download from s3 to your local directory
Before you can use Amazon Simple Storage Service with your RDS for PostgreSQL DB instance, you need to install the aws_s3 extension. This extension provides functions for exporting data from an RDS for PostgreSQL DB instance to an Amazon S3 bucket. It also provides functions for importing data from an Amazon S3.
Nowadays you can define a copy-activity to extract data from a Postgres RDS instance into S3. In the Data Pipeline interface:
this doesn't work yet. aws hasnt built / released the functionality to connect nicely to postgres. you can do it in a shellcommandactivity though. you can write a little ruby or python code to do it and drop that in a script on s3 using scriptUri. you could also just write a psql command to dump the table to a csv and then pipe that to OUTPUT1_STAGING_DIR with "staging: true" in that activity node.
something like this:
{
"id": "DumpCommand",
"type": "ShellCommandActivity",
"runsOn": { "ref": "MyEC2Resource" },
"stage": "true",
"output": { "ref": "S3ForRedshiftDataNode" },
"command": "PGPASSWORD=password psql -h HOST -U USER -d DATABASE -p 5432 -t -A -F\",\" -c \"select blah_id from blahs\" > ${OUTPUT1_STAGING_DIR}/my_data.csv"
}
i didn't run this to verify because it's a pain to spin up a pipeline :( so double check the escaping in the command.
look into the new stuff aws just launched on parameterized templating data pipelines: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-templates.html. it looks like it will allow encryption of arbitrary parameters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With