Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to dump a RDS database to S3 using AWS Data Pipeline?

Tags:

Basically I want to pg_dump my RDS database to S3 using AWS Data Pipeline,

I am not 100% sure if this is possible I got up to the stage where the SqlDataNode wants a selectQuery at which point i am wondering what to do.

Below is my template so far:

AWSTemplateFormatVersion: "2010-05-15"

Description: RDS to S3 Dump

Parameters:
  RDSInstanceID:
    Description: "Instance ID of RDS to Dump from"
  DatabaseName:
    Description: "Name of the Database to Dump"
    Type: String
  Username:
    Description: "Database Username"
    Type: String
  Password:
    Description: "Database password"
    Type: String
    NoEcho: true

RDSToS3Dump:
  Type: "AWS::DataPipeline::Pipeline"
  Properties:
    Name: "RDSToS3Dump"
    Description: "Pipeline to backup RDS data to S3"
    Activate: true
    ParameterObjects:
      -
        name: "SourceRDSTable"
        type: "SqlDataNode"
        Database: !Ref DatabaseName
      -
        name: !Ref DatabaseName
        type: "RdsDatabase"
        databaseName: !Ref DatabaseName
        username: !Ref Username
        password: !Ref Password
        rdsInstanceId: !Ref RDSInstanceID
      -
        name: "S3OutputLocation"
        type: "S3DataNode"
        filePath: #TODO: S3 Bucket here parameterized? Will actually need to create one.
      -
        name: "RDStoS3CopyActivity"
        type: "CopyActivity"
        input: "SourceRDSTable"
        output: "S3OutputLocation"
        #TODO: do we need a runsOn?
like image 398
Jesse Whitham Avatar asked May 15 '17 23:05

Jesse Whitham


People also ask

Can I transfer RDS data to S3 bucket?

Login to ec2 server, run the command for login to mysql database using master username and password credentials. Then import the dump database to rds using the command line. Start the data migration task by modifying the task created, once the replication is started it will replicate all data to s3 bucket created.

Does RDS store data in S3?

RDS is a managed database service. While it might internally store snapshots on an object storage service like S3, it won't expose that to you directly. It certainly won't use S3 for the actual primary RDS storage layer, of course, for performance reasons.


2 Answers

As mentioned in another answer, AWS Data Pipeline only allows you to dump tables and not the entire DB. If you really want to use pg_dump to dump the entire contents of your DB to S3 using AWS CloudFormation, you can use Lambda-backed custom resources. Going down that route, you'll have to write a Lambda function that:

  • Connects to the DB
  • Takes the dump of your DB using pg_dump
  • Uploads it to S3
like image 96
Aditya Avatar answered Oct 11 '22 12:10

Aditya


Using Data Pipeline I believe you can only dump tables rather than the whole db as with pg_dump.

Have you looked at the docs as selectQuery just requires a SQL statement for what you want to dump, i.e. "select * from mytable"? Maybe this helps. http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-sqldatanode.html

  -
    name: "SourceRDSTable"
    type: "SqlDataNode"
    Database: !Ref DatabaseName
    table: "mytable"
    selectQuery: "select * from #{table}"
like image 34
NHol Avatar answered Oct 11 '22 14:10

NHol