How can I use boto to stream a file out of Amazon S3 to Rackspace Cloudfiles?

Tags:

I'm copying a file from S3 to Cloudfiles, and I would like to avoid writing the file to disk. The Python-Cloudfiles library has an object.stream() call that looks to be what I need, but I can't find an equivalent call in boto. I'm hoping that I would be able to do something like:

shutil.copyfileobj(s3Object.stream(),rsObject.stream())

Is this possible with boto (or I suppose any other s3 library)?

443

asked Oct 02 '11 06:10

joemastersemison

2 Answers

Other answers in this thread are related to boto, but S3.Object is not iterable anymore in boto3. So, the following DOES NOT WORK, it produces an TypeError: 's3.Object' object is not iterable error message:

s3 = boto3.session.Session(profile_name=my_profile).resource('s3') s3_obj = s3.Object(bucket_name=my_bucket, key=my_key)  with io.FileIO('sample.txt', 'w') as file:     for i in s3_obj:         file.write(i)

In boto3, the contents of the object is available at S3.Object.get()['Body'] which is an iterable since version 1.9.68 but previously wasn't. Thus the following will work for the latest versions of boto3 but not earlier ones:

body = s3_obj.get()['Body'] with io.FileIO('sample.txt', 'w') as file:     for i in body:         file.write(i)

So, an alternative for older boto3 versions is to use the read method, but this loads the WHOLE S3 object in memory which when dealing with large files is not always a possibility:

body = s3_obj.get()['Body'] with io.FileIO('sample.txt', 'w') as file:     for i in body.read():         file.write(i)

But the read method allows to pass in the amt parameter specifying the number of bytes we want to read from the underlying stream. This method can be repeatedly called until the whole stream has been read:

body = s3_obj.get()['Body'] with io.FileIO('sample.txt', 'w') as file:     while file.write(body.read(amt=512)):         pass

Digging into botocore.response.StreamingBody code one realizes that the underlying stream is also available, so we could iterate as follows:

body = s3_obj.get()['Body'] with io.FileIO('sample.txt', 'w') as file:     for b in body._raw_stream:         file.write(b)

While googling I've also seen some links that could be use, but I haven't tried:

WrappedStreamingBody
Another related thread
An issue in boto3 github to request StreamingBody is a proper stream - which has been closed!!!

200

answered Sep 20 '22 14:09

smallo

The Key object in boto, which represents on object in S3, can be used like an iterator so you should be able to do something like this:

>>> import boto >>> c = boto.connect_s3() >>> bucket = c.lookup('garnaat_pub') >>> key = bucket.lookup('Scan1.jpg') >>> for bytes in key: ...   write bytes to output stream

Or, as in the case of your example, you could do:

>>> shutil.copyfileobj(key, rsObject.stream())

answered Sep 23 '22 14:09

garnaat

Related questions
                            
                                SELECT * in SQLAlchemy?
                            
                                'Webdrivers' executable may have wrong permissions. Please see https://sites.google.com/a/chromium.org/chromedriver/home
                            
                                How to delete columns in a CSV file?
                            
                                Invalid character in identifier
                            
                                split a generator/iterable every n items in python (splitEvery)
                            
                                Python: subplot within a loop: first panel appears in wrong position
                            
                                Extract the first paragraph from a Wikipedia article (Python)
                            
                                Write and read a list from file
                            
                                Problem with virtualenv in Mac OS X
                            
                                How to group a list of tuples/objects by similar index/attribute in python?
                            
                                How do I get a list of indices of non zero elements in a list?
                            
                                How does reduce function work?
                            
                                Django 1.7 migrations won't recreate a dropped table, why?
                            
                                How to implement a binary search tree in Python?
                            
                                Matplotlib adjust figure margin
                            
                                How to get the text cursor position in Windows?
                            
                                bitwise XOR of hex numbers in python
                            
                                How to delete old image when update ImageField?
                            
                                Is there a way to know by which Python version the .pyc file was compiled?
                            
                                how to find the owner of a file or directory in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I use boto to stream a file out of Amazon S3 to Rackspace Cloudfiles?

Tags:

python

amazon-s3

rackspace

boto

cloudfiles

joemastersemison

People also ask

2 Answers

smallo

garnaat

Recent Activity

Donate For Us