Beijing (China) region is quite unique as almost everything of it is separate (from AWS Global). For instance, to use AWS CLI to list the objects we have to specify region
and endpoint-url
:
aws --region cn-north-1 \
--endpoint-url https://s3.cn-north-1.amazonaws.com.cn \
--profile AN_AWS_CN_PROFILE \
s3 ls s3://AN_AWS_CN_BUCKET/
My question is how can I sync data between a Beijing (China) bucket and a global one? To begin with, a global bucket is not recognised with the specified region
& endpoint-url
above. For example,
aws --region cn-north-1 \
--endpoint-url https://s3.cn-north-1.amazonaws.com.cn \
--profile AN_AWS_CN_PROFILE \
s3 cp s3://AN_AWS_CN_BUCKET/ s3://AN_AWS_IRELAND_BUCKET/
will give
fatal error: An error occurred (NoSuchBucket) when calling the ListObjects operation: The specified bucket does not exist
(The destination bucket does exist.)
So far I've explored CLI arguments, and .aws/config
file to define profiles. Maybe it is possible to use multiple profile
in a single command, but it doesn't seem possible to config endpoint-url
in the .aws/config
, as it is not mentioned here.
Now I just copy files to local then upload which would become an issue as the dataset grows quickly. Using EMR may resolve the scaling issue, but it adds an extra layer of complexity. I wonder if there's a better and possibly easy solution? Thank you.
An S3 bucket exists in one region, not in multiple regions, but you can access that bucket from anywhere.
Since AWS China Regions are operated separately from other AWS Regions, including account credentials that are unique to AWS China Accounts, Amazon S3 Replication is not available between AWS China Regions and AWS Regions outside of China.
While the name space for buckets is global, S3 (like most of the other AWS services) runs in each AWS region (see the AWS Global Infrastructure page for more information).
The BJS/ZHY region is under different partition from the classic regions (aws-cn for BJS/ZHY, and aws for other regions)
The different partition stops accounts from BJS and classic regions from talking to each other - they cannot understand the ARN from other partitions, and they cannot whitelist or grant permissions to accounts from other partitions.
There are also some issues with S3 in China regions that certain S3 urls/IPs are blocked by local network providers.
To my experience, the best way to achieve that is either create a Lambda function to upload the S3 object(s) to buckets in classic partition, or configure a SNS->SQS->SQS listener pattern.
The problem is that the AWS API expects all operations to be run within one session, which is bound to a user or role. That said if your buckets both require permissions for uploading and downloading respectively, there is no way to have both permissions united in one session.
There is a few ways around this limitation, each with their own drawback:
a) Make each file temporarily public for the duration of the transfer. This requires some logic in form of a script or application. Essentially you assume a user or role within the source bucket's account and change the ACL of the file you are about to copy. You don't need to make the bucket listable, so any attacker would need to know the exact path in order to access your file during the window of transfer. With a role or user of the target account you would then read the now public file and save it to the target bucket. Repeat this once for each file.
b) Use a transfer instance. The AWS API doesn't allow assuming two roles/users at once, but you can assume the source bucket account's role/user first, copy all required files to local disk, and then upload to the target bucket using a second set of credentials. This mediator instance can be an EC2 instance or your local machine (if you have the bandwidth and volume capacities).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With