Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently copy all files from one directory to another in an amazon S3 bucket with boto?

I need to copy all keys from '/old/dir/' to '/new/dir/' in an amazon S3 bucket. I came up with this script (quick hack):

import boto

s3 = boto.connect_s3()
thebucket = s3.get_bucket("bucketname")
keys = thebucket.list('/old/dir')
for k in keys:
    newkeyname = '/new/dir' + k.name.partition('/old/dir')[2]
    print 'new key name:', newkeyname
    thebucket.copy_key(newkeyname, k.bucket.name, k.name)

For now it is working but is much slower than what I can do manually in the graphical managment console by just copy/past with the mouse. Very frustrating and there are lots of keys to copy...

Do you know any quicker method ? Thanks.

Edit: maybe I can do this with concurrent copy processes. I'm not really familiar with boto copy keys methods and how many concurrent processes I can send to amazon.

Edit2: i'm currently learning Python multiprocessing. Let's see if I can send 50 copy operations simultaneously...

Edit 3: I tried with 30 concurrent copy using Python multiprocessing module. Copy was much faster than within the console and less error prone. There is a new issue with large files (>5Gb): boto raises an exception. I need to debug this before posting the updated script.

like image 627
ascobol Avatar asked Feb 09 '12 21:02

ascobol


1 Answers

Regarding your issue with files over 5GB: S3 doesn't support uploading files over 5GB using the PUT method, which is what boto tries to do (see boto source, Amazon S3 documentation).

Unfortunately I'm not sure how you can get around this, apart from downloading it and re-uploading in a multi-part upload. I don't think boto supports a multi-part copy operation yet (if such a thing even exists)

like image 131
Chris B Avatar answered Oct 10 '22 03:10

Chris B