I'm copying a file from S3 to Cloudfiles, and I would like to avoid writing the file to disk. The Python-Cloudfiles library has an object.stream() call that looks to be what I need, but I can't find an equivalent call in boto. I'm hoping that I would be able to do something like:
shutil.copyfileobj(s3Object.stream(),rsObject.stream())
Is this possible with boto (or I suppose any other s3 library)?
Reading objects without downloading them Similarly, if you want to upload and read small pieces of textual data such as quotes, tweets, or news articles, you can do that using the S3 resource method put(), as demonstrated in the example below (Gist).
Other answers in this thread are related to boto, but S3.Object is not iterable anymore in boto3. So, the following DOES NOT WORK, it produces an TypeError: 's3.Object' object is not iterable
error message:
s3 = boto3.session.Session(profile_name=my_profile).resource('s3') s3_obj = s3.Object(bucket_name=my_bucket, key=my_key) with io.FileIO('sample.txt', 'w') as file: for i in s3_obj: file.write(i)
In boto3, the contents of the object is available at S3.Object.get()['Body']
which is an iterable since version 1.9.68 but previously wasn't. Thus the following will work for the latest versions of boto3 but not earlier ones:
body = s3_obj.get()['Body'] with io.FileIO('sample.txt', 'w') as file: for i in body: file.write(i)
So, an alternative for older boto3 versions is to use the read method, but this loads the WHOLE S3 object in memory which when dealing with large files is not always a possibility:
body = s3_obj.get()['Body'] with io.FileIO('sample.txt', 'w') as file: for i in body.read(): file.write(i)
But the read
method allows to pass in the amt
parameter specifying the number of bytes we want to read from the underlying stream. This method can be repeatedly called until the whole stream has been read:
body = s3_obj.get()['Body'] with io.FileIO('sample.txt', 'w') as file: while file.write(body.read(amt=512)): pass
Digging into botocore.response.StreamingBody
code one realizes that the underlying stream is also available, so we could iterate as follows:
body = s3_obj.get()['Body'] with io.FileIO('sample.txt', 'w') as file: for b in body._raw_stream: file.write(b)
While googling I've also seen some links that could be use, but I haven't tried:
The Key object in boto, which represents on object in S3, can be used like an iterator so you should be able to do something like this:
>>> import boto >>> c = boto.connect_s3() >>> bucket = c.lookup('garnaat_pub') >>> key = bucket.lookup('Scan1.jpg') >>> for bytes in key: ... write bytes to output stream
Or, as in the case of your example, you could do:
>>> shutil.copyfileobj(key, rsObject.stream())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With