I'm using AWS data pipeline service to pipe data from a <code>RDS MySql</code> database to <code>s3</code> and then on to <code>Redshift</code>, which works nicely. However, I also have data living in an <code>RDS Postres</code> instance which I would like to pipe the same way but I'm having a hard time setting up the jdbc-connection. If this is unsupported, is there a work-around? <pre class="prettyprint"><code>"connectionString": "jdbc:postgresql://THE_RDS_INSTANCE:5432/THE_DB” </code></pre>

Nowadays you can define a copy-activity to extract data from a Postgres RDS instance into S3. In the Data Pipeline interface: <ol> <li>Create a data node of the type SqlDataNode. Specify table name and select query</li> <li>Setup the database connection by specifying RDS instance ID (the instance ID is in your URL, e.g. your-instance-id.xxxxx.eu-west-1.rds.amazonaws.com) along with username, password and database name. </li> <li>Create a data node of the type S3DataNode</li> <li>Create a Copy activity and set the SqlDataNode as input and the S3DataNode as output</li> </ol>

How to pipe data from AWS Postgres RDS to S3 (then Redshift)?

Tags:

postgresql

amazon-web-services

amazon-redshift

amazon-data-pipeline

I'm using AWS data pipeline service to pipe data from a RDS MySql database to s3 and then on to Redshift, which works nicely.

However, I also have data living in an RDS Postres instance which I would like to pipe the same way but I'm having a hard time setting up the jdbc-connection. If this is unsupported, is there a work-around?

"connectionString": "jdbc:postgresql://THE_RDS_INSTANCE:5432/THE_DB”

505

asked Nov 06 '14 14:11

jenswirf

2 Answers

Nowadays you can define a copy-activity to extract data from a Postgres RDS instance into S3. In the Data Pipeline interface:

Create a data node of the type SqlDataNode. Specify table name and select query
Setup the database connection by specifying RDS instance ID (the instance ID is in your URL, e.g. your-instance-id.xxxxx.eu-west-1.rds.amazonaws.com) along with username, password and database name.
Create a data node of the type S3DataNode
Create a Copy activity and set the SqlDataNode as input and the S3DataNode as output

137

answered Sep 23 '22 13:09

PeterssonJesper

this doesn't work yet. aws hasnt built / released the functionality to connect nicely to postgres. you can do it in a shellcommandactivity though. you can write a little ruby or python code to do it and drop that in a script on s3 using scriptUri. you could also just write a psql command to dump the table to a csv and then pipe that to OUTPUT1_STAGING_DIR with "staging: true" in that activity node.

something like this:

{
  "id": "DumpCommand",
  "type": "ShellCommandActivity",
  "runsOn": { "ref": "MyEC2Resource" },
  "stage": "true",
  "output": { "ref": "S3ForRedshiftDataNode" },
  "command": "PGPASSWORD=password psql -h HOST -U USER -d DATABASE -p 5432 -t -A -F\",\" -c \"select blah_id from blahs\" > ${OUTPUT1_STAGING_DIR}/my_data.csv"
}

i didn't run this to verify because it's a pain to spin up a pipeline :( so double check the escaping in the command.

pros: super straightforward and requires no additional script files to upload to s3
cons: not exactly secure. your db password will be transmitted over the wire without encryption.

look into the new stuff aws just launched on parameterized templating data pipelines: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-templates.html. it looks like it will allow encryption of arbitrary parameters.

answered Sep 24 '22 13:09

xgess

Related questions
                            
                                "Operator does not exist: integer =?" when using Postgres
                            
                                plv8 disadvantages or limitations?
                            
                                In PostgreSQL, How to find which table uses specific Sequence?
                            
                                How to resolve the error 'fe_sendauth: no password supplied' in Rails using PostgreSQL?
                            
                                How to create a pg_trgm index using SQLAlchemy for Scrapy?
                            
                                Postgres - list databases from Mac terminal
                            
                                What is the performance hit of using a string type vs a uuid type for a UUID primary key?
                            
                                Sequelize transaction error
                            
                                Track last modification timestamp of a row in Postgres
                            
                                Reading UUID from result set in Postgres JDBC
                            
                                How to Mock postgresql (pg) in node.js using jest
                            
                                Can I convert a bunch of boolean columns to a single bitmap in PostgreSQL?
                            
                                lastInsertId does not work in Postgresql
                            
                                Django annotate on BooleanField
                            
                                PSQLException: this ResultSet is closed
                            
                                Rails 4: simple wildcard search from console
                            
                                Best way to change the owner of a PostgreSQL database and their tables?
                            
                                Postgres timestamp with timezone
                            
                                Copy (from) csv with heades in postgres with python
                            
                                Sidekiq won't shut down after Ctrl+C

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With