With the latest Aurora update (1.8), the command LOAD DATA FROM S3
was introduced. Has anyone gotten this to work? After upgrading to 1.8, I followed the setup guide Here to create the Role to allow access from RDS to S3.
After rebooting the server and trying to run the command
LOAD DATA FROM S3 PREFIX 's3://<bucket_name>/prefix' INTO TABLE table_name
in SQL Workbench/J, I get the errors:
Warnings:
S3 API returned error: Missing Credentials: Cannot instantiate S3 Client
S3 API returned error: Failed to instantiate S3 Client
Internal error: Unable to initialize S3Stream
Are there any additional steps required? Can I only run this from the SDK? I don't see that mentioned anywhere in the documents
If you don't specify a region value, then Aurora loads your file from Amazon S3 in the same region as your DB cluster. bucket-name – The name of the Amazon S3 bucket that contains the data to load. Object prefixes that identify a virtual folder path are supported.
Loading data into a table from text files in an Amazon S3 bucket is available for Amazon Aurora MySQL version 1.8 and later. For more information about Aurora MySQL versions, see Database engine updates for Amazon Aurora MySQL . This feature currently isn't available for Aurora Serverless clusters.
Create an AWS Identity and Access Management (IAM) policy that provides the bucket and object permissions that allow your Aurora MySQL DB cluster to access Amazon S3. For instructions, see Creating an IAM policy to access Amazon S3 resources .
Go to IAM -> Roles in the AWS console, select the role you are using, click 'attach policy', then scroll way down to the S3 policies and pick one. I reached out to Amazon Aurora team and they confirmed there are edge cases with some of the servers having this issue.
I had the same issue. I tried adding AmazonS3FullAccess to the IAM role that my RDS instances were using...no joy.
After poking around, I went into the RDS console, to Clusters. Selected my Aurora cluster and clicked Manage IAM Roles. It gave me a drop-down, I selected the IAM role (same one that the individual instances were using).
Once I did that, all was well and data load was nice and fast.
So, there are (for us) 5 steps/components:
1) The S3 bucket and bucket policy to allow a user to upload the object
{
"Version": "2012-10-17",
"Id": "Policy1453918146601",
"Statement": [
{
"Sid": "Stmt1453917898368",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<account id>:<user/group/role>/<IAM User/Group/Role>"
},
"Action": [
"s3:DeleteObject",
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::<bucket name>/*"
}
]
}
The "Principal" would be whatever IAM user, group or role will be uploading the data files to the bucket so that the RDS instance can import the data.
2) The IAM policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1486490368000",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<bucket name>/*"
]
}
]
}
This is pretty simple with the Policy Generator.
3) Create the IAM Role:
This role should be assigned the IAM policy above. You can probably do an inline policy, too, if you're not going to use this policy for other roles down the line, but I like the idea of having a defined policy that I can reference later if I have a need.
4) Configure a Parameter Group that your cluster/instances will use to set the aws_default_s3_role value to the ARN of the role from #3 above.
5) Configure the Aurora Cluster by going to Clusters, selecting your cluster, selecting Manage IAM Roles and setting the IAM Role for your DB Cluster
At least for me, these steps worked like a charm.
Hope that helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With