Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Aurora 1.8 Load Data From S3 - Cannot Instantiate S3 Client

With the latest Aurora update (1.8), the command LOAD DATA FROM S3 was introduced. Has anyone gotten this to work? After upgrading to 1.8, I followed the setup guide Here to create the Role to allow access from RDS to S3.

After rebooting the server and trying to run the command

LOAD DATA FROM S3 PREFIX 's3://<bucket_name>/prefix' INTO TABLE table_name

in SQL Workbench/J, I get the errors:

Warnings:
S3 API returned error: Missing Credentials: Cannot instantiate S3 Client
S3 API returned error: Failed to instantiate S3 Client
Internal error: Unable to initialize S3Stream

Are there any additional steps required? Can I only run this from the SDK? I don't see that mentioned anywhere in the documents

like image 667
aarbor Avatar asked Oct 25 '16 18:10

aarbor


People also ask

How does Aurora load files from Amazon S3?

If you don't specify a region value, then Aurora loads your file from Amazon S3 in the same region as your DB cluster. bucket-name – The name of the Amazon S3 bucket that contains the data to load. Object prefixes that identify a virtual folder path are supported.

Can I load data into a table from an Amazon S3 bucket?

Loading data into a table from text files in an Amazon S3 bucket is available for Amazon Aurora MySQL version 1.8 and later. For more information about Aurora MySQL versions, see Database engine updates for Amazon Aurora MySQL . This feature currently isn't available for Aurora Serverless clusters.

How do I configure Amazon S3 for Aurora MySQL db cluster?

Create an AWS Identity and Access Management (IAM) policy that provides the bucket and object permissions that allow your Aurora MySQL DB cluster to access Amazon S3. For instructions, see Creating an IAM policy to access Amazon S3 resources .

How do I attach an S3 policy to an AWS role?

Go to IAM -> Roles in the AWS console, select the role you are using, click 'attach policy', then scroll way down to the S3 policies and pick one. I reached out to Amazon Aurora team and they confirmed there are edge cases with some of the servers having this issue.


1 Answers

I had the same issue. I tried adding AmazonS3FullAccess to the IAM role that my RDS instances were using...no joy.

After poking around, I went into the RDS console, to Clusters. Selected my Aurora cluster and clicked Manage IAM Roles. It gave me a drop-down, I selected the IAM role (same one that the individual instances were using).

Once I did that, all was well and data load was nice and fast.

So, there are (for us) 5 steps/components:

1) The S3 bucket and bucket policy to allow a user to upload the object

{
    "Version": "2012-10-17",
    "Id": "Policy1453918146601",
    "Statement": [
        {
            "Sid": "Stmt1453917898368",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<account id>:<user/group/role>/<IAM User/Group/Role>"
            },
            "Action": [
                "s3:DeleteObject",
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::<bucket name>/*"
        }
    ]
}

The "Principal" would be whatever IAM user, group or role will be uploading the data files to the bucket so that the RDS instance can import the data.

2) The IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1486490368000",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<bucket name>/*"
            ]
        }
    ]
}

This is pretty simple with the Policy Generator.

3) Create the IAM Role:

This role should be assigned the IAM policy above. You can probably do an inline policy, too, if you're not going to use this policy for other roles down the line, but I like the idea of having a defined policy that I can reference later if I have a need.

4) Configure a Parameter Group that your cluster/instances will use to set the aws_default_s3_role value to the ARN of the role from #3 above.

5) Configure the Aurora Cluster by going to Clusters, selecting your cluster, selecting Manage IAM Roles and setting the IAM Role for your DB Cluster

At least for me, these steps worked like a charm.

Hope that helps!

like image 196
Dan Carrington Avatar answered Sep 21 '22 15:09

Dan Carrington