How can I sync data in S3 between a Beijing(China) bucket and a global one?

Tags:

amazon-s3

Beijing (China) region is quite unique as almost everything of it is separate (from AWS Global). For instance, to use AWS CLI to list the objects we have to specify region and endpoint-url:

aws --region cn-north-1 \
  --endpoint-url https://s3.cn-north-1.amazonaws.com.cn \
  --profile AN_AWS_CN_PROFILE \
  s3 ls s3://AN_AWS_CN_BUCKET/

My question is how can I sync data between a Beijing (China) bucket and a global one? To begin with, a global bucket is not recognised with the specified region & endpoint-url above. For example,

aws --region cn-north-1 \
  --endpoint-url https://s3.cn-north-1.amazonaws.com.cn \
  --profile AN_AWS_CN_PROFILE \
  s3 cp s3://AN_AWS_CN_BUCKET/ s3://AN_AWS_IRELAND_BUCKET/

will give

fatal error: An error occurred (NoSuchBucket) when calling the ListObjects operation: The specified bucket does not exist

(The destination bucket does exist.)

So far I've explored CLI arguments, and .aws/config file to define profiles. Maybe it is possible to use multiple profile in a single command, but it doesn't seem possible to config endpoint-url in the .aws/config, as it is not mentioned here.

Now I just copy files to local then upload which would become an issue as the dataset grows quickly. Using EMR may resolve the scaling issue, but it adds an extra layer of complexity. I wonder if there's a better and possibly easy solution? Thank you.

801

asked Aug 10 '17 09:08

shuaiyuancn

2 Answers

The BJS/ZHY region is under different partition from the classic regions (aws-cn for BJS/ZHY, and aws for other regions)

The different partition stops accounts from BJS and classic regions from talking to each other - they cannot understand the ARN from other partitions, and they cannot whitelist or grant permissions to accounts from other partitions.

There are also some issues with S3 in China regions that certain S3 urls/IPs are blocked by local network providers.

To my experience, the best way to achieve that is either create a Lambda function to upload the S3 object(s) to buckets in classic partition, or configure a SNS->SQS->SQS listener pattern.

168

answered Sep 19 '22 13:09

Alex

The problem is that the AWS API expects all operations to be run within one session, which is bound to a user or role. That said if your buckets both require permissions for uploading and downloading respectively, there is no way to have both permissions united in one session.

There is a few ways around this limitation, each with their own drawback:

a) Make each file temporarily public for the duration of the transfer. This requires some logic in form of a script or application. Essentially you assume a user or role within the source bucket's account and change the ACL of the file you are about to copy. You don't need to make the bucket listable, so any attacker would need to know the exact path in order to access your file during the window of transfer. With a role or user of the target account you would then read the now public file and save it to the target bucket. Repeat this once for each file.

b) Use a transfer instance. The AWS API doesn't allow assuming two roles/users at once, but you can assume the source bucket account's role/user first, copy all required files to local disk, and then upload to the target bucket using a second set of credentials. This mediator instance can be an EC2 instance or your local machine (if you have the bandwidth and volume capacities).

answered Sep 19 '22 13:09

Hubert Grzeskowiak

Related questions
                            
                                How to deploy an artifact into Amazon S3 with Maven?
                            
                                Get non file body from multipart/form-data using AWS API Gateway and Lambda
                            
                                Issue when copy AMI owned by another account
                            
                                Why my Serverless Lambda unable to access S3 bucket and items?
                            
                                Issues connecting to Amazon RDS Postgres database on node.js using sequelize ORM
                            
                                Is there anyway to create a friendly URL for AWS Elasticsearch domain url?
                            
                                Printf Statement not working on lambda
                            
                                Is there a way to parameterize cloud formation resource names?
                            
                                AWS ELB throwing 5XX but the registered instances are not throwing 5XX errors
                            
                                Running a cron job in Elastic Beanstalk
                            
                                How do you call adminInitiateAuth from NodeJS Lambda?
                            
                                Independent python subprocess from AWS Lambda function
                            
                                How to copy folder from S3 to elastic beanstalk instance on instance creation
                            
                                Django doesn't see environment variables when deployed to Elastic Beanstalk
                            
                                How to Use AWS S3 C++ SDK TransferManager DownloadFile Callback
                            
                                SSH connection in Amazon lightsail
                            
                                How to update a string set In DDB that is nested inside a map
                            
                                I'm uploading data from my Swift app to Amazon S3 and it drains battery like nothing else. How can this be avoided?
                            
                                How to add a security group to an existing EC2 instance with CloudFormation
                            
                                Sklearn on aws lambda

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With