Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import Postgres data into RDS using S3 and aws_s3

I'm having a hard time importing data from S3 into an RDS postgres instance. According to the docs, you can use this syntax:

aws_s3.table_import_from_s3 (
   table_name text, 
   column_list text, 
   options text, 
   bucket text, 
   file_path text, 
   region text, 
   access_key text, 
   secret_key text, 
   session_token text 
) 

So, in pgAdmin, I did this:

SELECT aws_s3.table_import_from_s3(
  'contacts_1', 
  'firstname,lastname,imported', 
  '(format csv)',
  'com.foo.mybucket', 
  'mydir/subdir/myfile.csv', 
  'us-east-2',
  'AKIAYYXUMxxxxxxxxxxx',
  '3zB4S5jb1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
);

I also tried it with an explicit NULL for the last parameter.

The error message I get is:

NOTICE:  CURL error code: 51 when attempting to validate pre-signed URL, 1 attempt(s) remaining
NOTICE:  CURL error code: 51 when attempting to validate pre-signed URL, 0 attempt(s) remaining

ERROR:  Unable to generate pre-signed url, look at engine log for details.
SQL state: XX000

I checked the server logs and there was no further information.

I have triple-checked the correctness of all the parameters. How do I make this work?

UPDATE:

I can confirm that I can do an s3.getObject() in the Java aws sdk using these same credentials.

like image 375
ccleve Avatar asked Oct 29 '19 22:10

ccleve


People also ask

How do I transfer files from S3 to RDS?

On the Visual editor tab, choose Choose a service, and then choose S3. For Actions, choose Expand all, and then choose the bucket permissions and object permissions required to transfer files from an Amazon S3 bucket to Amazon RDS.

Can Postgres read from S3?

You can import data that's been stored using Amazon Simple Storage Service into a table on an RDS for PostgreSQL DB instance. To do this, you first install the RDS for PostgreSQL aws_s3 extension. This extension provides the functions that you use to import data from an Amazon S3 bucket.


1 Answers

The main issue here is that you need to 1) add a IAM role to the RDS instance to access the S3 bucket and 2) add an S3 endpoint to the VPC where the RDS instance run in order to allow communications.

This is the procedure I followed to make it work, using AWS cli commands in a shell (take care of value properly the environmental variables involved), hope it can help:

  1. Create the IAM role:
$ aws iam create-role \
    --role-name $ROLE_NAME \
    --assume-role-policy-document '{"Version": "2012-10-17", "Statement": [{"Effect": "Allow", "Principal": {"Service": "rds.amazonaws.com"}, "Action": "sts:AssumeRole"}]}'
  1. Create the IAM policy that will be attached to the IAM role:
$ aws iam create-policy \
    --policy-name $POLICY_NAME \
    --policy-document '{"Version": "2012-10-17", "Statement": [{"Sid": "s3import", "Action": ["s3:GetObject", "s3:ListBucket"], "Effect": "Allow", "Resource": ["arn:aws:s3:::${BUCKET_NAME}", "arn:aws:s3:::${BUCKET_NAME}/*"]}]}'
  1. Attach the policy:
$ aws iam attach-role-policy \
    --policy-arn arn:aws:iam::$AWS_ACCOUNT_ID:policy/$POLICY_NAME \
    --role-name $ROLE_NAME
  1. Add the role to a specific instance - this step need to be repeated for every new instance:
$ aws rds add-role-to-db-instance \
    --db-instance-identifier $RDS_INSTANCE_NAME \
    --feature-name s3Import \
    --role-arn arn:aws:iam::$AWS_ACCOUNT_ID:role/$ROLE_NAME \
    --region $REGION
  1. Create the VPC endpoint for the S3 service:
$ aws ec2 create-vpc-endpoint \
    --vpc-id $VPC_ID \
    --service-name com.amazonaws.$REGION.s3
    --route-table-ids $ROUTE_TABLE_ID

The route table id related to the VPC where the endpoint is created can be retrieved through the command

$ aws ec2 describe-route-tables | jq -r '.RouteTables[] | "\(.VpcId) \(.RouteTableId)"'
like image 199
Giuseppe Broccolo Avatar answered Sep 27 '22 22:09

Giuseppe Broccolo