According to S3.Client.upload_file and S3.Client.upload_fileobj, upload_fileobj
may sound faster. But does anyone know specifics? Should I just upload the file, or should I open the file in binary mode to use upload_fileobj
? In other words,
import boto3
s3 = boto3.resource('s3')
### Version 1
s3.meta.client.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt')
### Version 2
with open('/tmp/hello.txt', 'rb') as data:
s3.upload_fileobj(data, 'mybucket', 'hello.txt')
Is version 1 or version 2 better? Is there a difference?
Boto3 is the official AWS SDK for Python, used to create, configure, and manage AWS services. The following are examples of defining a resource/client in boto3 for the Weka S3 service, managing credentials, and pre-signed URLs, generating secure temporary tokens, and using those to run S3 API calls.
put_object` does not overwrite the existing data in the bucket.
The main point with upload_fileobj
is that file object doesn't have to be stored on local disk in the first place, but may be represented as file object in RAM.
Python have standard library module for that purpose.
The code will look like
import io
fo = io.BytesIO(b'my data stored as file object in RAM')
s3.upload_fileobj(fo, 'mybucket', 'hello.txt')
In that case it will perform faster, since you don't have to read from local disk.
in terms of speed, both methods will perform roughly the same, both written in python and the bottleneck will be either disk-io (read file from disk) or network-io (write to s3).
upload_file()
when writing code that only handles uploading files from disk.upload_fileobj()
when you writing generic code to handle s3 upload that may be reused in future for not only file from disk usecase.there is convention in multiple places including the python standard library, that when one is using the term fileobj
she means file-like object.
There are even some libraries exposing functions that can take file path (str) or fileobj (file-like object) as the same parameter.
when using file object your code is not limited to disk only, for example:
for example you can copy data from one s3 object into another in streaming fashion (without using disk space or slowing down the process for doing read/write io to disk).
you can (de)compress or decrypt data on the fly when writing objects to S3
example using python gzip module with file-like object in generic way:
import gzip, io
def gzip_greet_file(fileobj):
"""write gzipped hello message to a file"""
with gzip.open(filename=fileobj, mode='wb') as fp:
fp.write(b'hello!')
# using opened file
gzip_greet_file(open('/tmp/a.gz', 'wb'))
# using filename from disk
gzip_greet_file('/tmp/b.gz')
# using io buffer
file = io.BytesIO()
gzip_greet_file(file)
file.seek(0)
print(file.getvalue())
tarfile on the other hand has two parameters file & fileobj:
tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)
s3.upload_fileobj()
import gzip, boto3
s3 = boto3.resource('s3')
def upload_file(fileobj, bucket, key, compress=False):
if compress:
fileobj = gzip.GzipFile(fileobj=fileobj, mode='rb')
key = key + '.gz'
s3.upload_fileobj(fileobj, bucket, key)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With