Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest / best way copy data between S3 to EC2?

Tags:

I have a fairly large amount of data (~30G, split into ~100 files) I'd like to transfer between S3 and EC2: when I fire up the EC2 instances I'd like to copy the data from S3 to EC2 local disks as quickly as I can, and when I'm done processing I'd like to copy the results back to S3.

I'm looking for a tool that'll do a fast / parallel copy of the data back and forth. I have several scripts hacked up, including one that does a decent job, so I'm not looking for pointers to basic libraries; I'm looking for something fast and reliable.

like image 510
Parand Avatar asked Apr 14 '09 20:04

Parand


People also ask

How fast is S3 to EC2?

Resolution. Traffic between Amazon EC2 and Amazon S3 can leverage up to 100 Gbps of bandwidth to VPC endpoints and public IPs in the same Region.

How fast is S3 copy?

S3-bucket-copying performance can exceed 8 gigabytes per second.


2 Answers

Unfortunately, Adam's suggestion won't work as his understanding of EBS is wrong (although I wish he was right and often thought myself it should work that way)... as EBS has nothing to do with S3, but it will only give you an "external drive" for EC2 instances that are separate, but connectable to the instances. You still have to do copying between S3 and EC2, even though there are no data transfer costs between the two.

You didn't mention an operating system of your instance, so I cannot give tailored information. A popular command line tool I use is http://s3tools.org/s3cmd ... it is based on Python and therefore, according to info on its website it should work on Win as well as Linux, although I use it ALL the time on Linux. You could easily whip up a quick script that uses its built in "sync" command that works similar to rsync, and have it triggered every time you're done processing your data. You could also use the recursive put and get commands to get and put data only when needed.

There are graphical tools like Cloudberry Pro that have some command line options for Windows too that you can setup schedule commands. http://s3tools.org/s3cmd is probably the easiest.

like image 192
Tyler Avatar answered Sep 19 '22 06:09

Tyler


By now, there is a sync command in the AWS Command line tools, that should do the trick: http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

On startup: aws s3 sync s3://mybucket /mylocalfolder

before shutdown: aws s3 sync /mylocalfolder s3://mybucket

Of course, the details are always fun to work out eg. how can parallel it is (and can you make it more parallel and is that any faster goven the virtual nature of the whole setup)

Btw hope you're still working on this... or somebody is. ;)

like image 44
Gyuri Avatar answered Sep 19 '22 06:09

Gyuri