I need to copy all keys from '/old/dir/' to '/new/dir/' in an amazon S3 bucket. I came up with this script (quick hack):
import boto
s3 = boto.connect_s3()
thebucket = s3.get_bucket("bucketname")
keys = thebucket.list('/old/dir')
for k in keys:
newkeyname = '/new/dir' + k.name.partition('/old/dir')[2]
print 'new key name:', newkeyname
thebucket.copy_key(newkeyname, k.bucket.name, k.name)
For now it is working but is much slower than what I can do manually in the graphical managment console by just copy/past with the mouse. Very frustrating and there are lots of keys to copy...
Do you know any quicker method ? Thanks.
Edit: maybe I can do this with concurrent copy processes. I'm not really familiar with boto copy keys methods and how many concurrent processes I can send to amazon.
Edit2: i'm currently learning Python multiprocessing. Let's see if I can send 50 copy operations simultaneously...
Edit 3: I tried with 30 concurrent copy using Python multiprocessing module. Copy was much faster than within the console and less error prone. There is a new issue with large files (>5Gb): boto raises an exception. I need to debug this before posting the updated script.
Regarding your issue with files over 5GB: S3 doesn't support uploading files over 5GB using the PUT method, which is what boto tries to do (see boto source, Amazon S3 documentation).
Unfortunately I'm not sure how you can get around this, apart from downloading it and re-uploading in a multi-part upload. I don't think boto supports a multi-part copy operation yet (if such a thing even exists)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With