Is it possible to dump a RDS database to S3 using AWS Data Pipeline?

Tags:

Basically I want to pg_dump my RDS database to S3 using AWS Data Pipeline,

I am not 100% sure if this is possible I got up to the stage where the SqlDataNode wants a selectQuery at which point i am wondering what to do.

Below is my template so far:

AWSTemplateFormatVersion: "2010-05-15"

Description: RDS to S3 Dump

Parameters:
  RDSInstanceID:
    Description: "Instance ID of RDS to Dump from"
  DatabaseName:
    Description: "Name of the Database to Dump"
    Type: String
  Username:
    Description: "Database Username"
    Type: String
  Password:
    Description: "Database password"
    Type: String
    NoEcho: true

RDSToS3Dump:
  Type: "AWS::DataPipeline::Pipeline"
  Properties:
    Name: "RDSToS3Dump"
    Description: "Pipeline to backup RDS data to S3"
    Activate: true
    ParameterObjects:
      -
        name: "SourceRDSTable"
        type: "SqlDataNode"
        Database: !Ref DatabaseName
      -
        name: !Ref DatabaseName
        type: "RdsDatabase"
        databaseName: !Ref DatabaseName
        username: !Ref Username
        password: !Ref Password
        rdsInstanceId: !Ref RDSInstanceID
      -
        name: "S3OutputLocation"
        type: "S3DataNode"
        filePath: #TODO: S3 Bucket here parameterized? Will actually need to create one.
      -
        name: "RDStoS3CopyActivity"
        type: "CopyActivity"
        input: "SourceRDSTable"
        output: "S3OutputLocation"
        #TODO: do we need a runsOn?

398

asked May 15 '17 23:05

Jesse Whitham

2 Answers

As mentioned in another answer, AWS Data Pipeline only allows you to dump tables and not the entire DB. If you really want to use pg_dump to dump the entire contents of your DB to S3 using AWS CloudFormation, you can use Lambda-backed custom resources. Going down that route, you'll have to write a Lambda function that:

Connects to the DB
Takes the dump of your DB using pg_dump
Uploads it to S3

answered Oct 11 '22 12:10

Aditya

Using Data Pipeline I believe you can only dump tables rather than the whole db as with pg_dump.

Have you looked at the docs as selectQuery just requires a SQL statement for what you want to dump, i.e. "select * from mytable"? Maybe this helps. http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-sqldatanode.html

  -
    name: "SourceRDSTable"
    type: "SqlDataNode"
    Database: !Ref DatabaseName
    table: "mytable"
    selectQuery: "select * from #{table}"

answered Oct 11 '22 14:10

NHol

Related questions
                            
                                Unable to uninstall anaconda from Ubuntu 16.04
                            
                                Why does sequence iteration work in C macro?
                            
                                XPATH: Find piece of text anywhere in document
                            
                                Looking for expandable tableview to convert my JSON tree structure
                            
                                Rspec allow and expect the same method with different arguments
                            
                                Getting my postgresql db version?
                            
                                Where to find data inside a RDD in a eclipse Spark scala debug session?
                            
                                storing a vertex as JSON in CosmosDB
                            
                                Cannot update during an existing state transition error in React
                            
                                Read a zip file in R from a subfolder
                            
                                Kafka on AWS ECS, how to handle advertised.host without known instance?
                            
                                Launching Programs (example: Vim) from Haskell

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With