Basically I want to pg_dump
my RDS database to S3 using AWS Data Pipeline,
I am not 100% sure if this is possible I got up to the stage where the SqlDataNode
wants a selectQuery
at which point i am wondering what to do.
Below is my template so far:
AWSTemplateFormatVersion: "2010-05-15"
Description: RDS to S3 Dump
Parameters:
RDSInstanceID:
Description: "Instance ID of RDS to Dump from"
DatabaseName:
Description: "Name of the Database to Dump"
Type: String
Username:
Description: "Database Username"
Type: String
Password:
Description: "Database password"
Type: String
NoEcho: true
RDSToS3Dump:
Type: "AWS::DataPipeline::Pipeline"
Properties:
Name: "RDSToS3Dump"
Description: "Pipeline to backup RDS data to S3"
Activate: true
ParameterObjects:
-
name: "SourceRDSTable"
type: "SqlDataNode"
Database: !Ref DatabaseName
-
name: !Ref DatabaseName
type: "RdsDatabase"
databaseName: !Ref DatabaseName
username: !Ref Username
password: !Ref Password
rdsInstanceId: !Ref RDSInstanceID
-
name: "S3OutputLocation"
type: "S3DataNode"
filePath: #TODO: S3 Bucket here parameterized? Will actually need to create one.
-
name: "RDStoS3CopyActivity"
type: "CopyActivity"
input: "SourceRDSTable"
output: "S3OutputLocation"
#TODO: do we need a runsOn?
Login to ec2 server, run the command for login to mysql database using master username and password credentials. Then import the dump database to rds using the command line. Start the data migration task by modifying the task created, once the replication is started it will replicate all data to s3 bucket created.
RDS is a managed database service. While it might internally store snapshots on an object storage service like S3, it won't expose that to you directly. It certainly won't use S3 for the actual primary RDS storage layer, of course, for performance reasons.
As mentioned in another answer, AWS Data Pipeline only allows you to dump tables and not the entire DB. If you really want to use pg_dump
to dump the entire contents of your DB to S3 using AWS CloudFormation
, you can
use Lambda-backed custom resources. Going down that route, you'll have to write a Lambda function that:
pg_dump
Using Data Pipeline I believe you can only dump tables rather than the whole db as with pg_dump.
Have you looked at the docs as selectQuery just requires a SQL statement for what you want to dump, i.e. "select * from mytable"? Maybe this helps. http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-sqldatanode.html
-
name: "SourceRDSTable"
type: "SqlDataNode"
Database: !Ref DatabaseName
table: "mytable"
selectQuery: "select * from #{table}"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With