According to S3.Client.upload_file and S3.Client.upload_fileobj, <code>upload_fileobj</code> may sound faster. But does anyone know specifics? Should I just upload the file, or should I open the file in binary mode to use <code>upload_fileobj</code>? In other words, <pre class="prettyprint"><code>import boto3 s3 = boto3.resource('s3') ### Version 1 s3.meta.client.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt') ### Version 2 with open('/tmp/hello.txt', 'rb') as data: s3.upload_fileobj(data, 'mybucket', 'hello.txt') </code></pre> Is version 1 or version 2 better? Is there a difference?

The main point with <code>upload_fileobj</code> is that file object doesn't have to be stored on local disk in the first place, but may be represented as file object in RAM. Python have standard library module for that purpose. The code will look like <pre class="prettyprint"><code>import io fo = io.BytesIO(b'my data stored as file object in RAM') s3.upload_fileobj(fo, 'mybucket', 'hello.txt') </code></pre> In that case it will perform faster, since you don't have to read from local disk.

<h3>TL;DR</h3> in terms of speed, both methods will perform roughly the same, both written in python and the bottleneck will be either disk-io (read file from disk) or network-io (write to s3). <ul> <li>use <code>upload_file()</code> when writing code that only handles uploading files from disk.</li> <li>use <code>upload_fileobj()</code> when you writing generic code to handle s3 upload that may be reused in future for not only file from disk usecase.</li> </ul> <h3>What is fileobj anyway?</h3> there is convention in multiple places including the python standard library, that when one is using the term <code>fileobj</code> she means file-like object. There are even some libraries exposing functions that can take file path (str) or fileobj (file-like object) as the same parameter. when using file object your code is not limited to disk only, for example: <ol> <li>for example you can copy data from one s3 object into another in streaming fashion (without using disk space or slowing down the process for doing read/write io to disk).</li> <li>you can (de)compress or decrypt data on the fly when writing objects to S3</li> </ol> example using python gzip module with file-like object in generic way: <pre class="prettyprint"><code>import gzip, io def gzip_greet_file(fileobj): """write gzipped hello message to a file""" with gzip.open(filename=fileobj, mode='wb') as fp: fp.write(b'hello!') # using opened file gzip_greet_file(open('/tmp/a.gz', 'wb')) # using filename from disk gzip_greet_file('/tmp/b.gz') # using io buffer file = io.BytesIO() gzip_greet_file(file) file.seek(0) print(file.getvalue()) </code></pre> tarfile on the other hand has two parameters file & fileobj: <pre class="prettyprint"><code>tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs) </code></pre> <h3>Example compression on-the-fly with <code>s3.upload_fileobj()</code> </h3> <pre class="prettyprint"><code>import gzip, boto3 s3 = boto3.resource('s3') def upload_file(fileobj, bucket, key, compress=False): if compress: fileobj = gzip.GzipFile(fileobj=fileobj, mode='rb') key = key + '.gz' s3.upload_fileobj(fileobj, bucket, key) </code></pre>

What is the difference between S3.Client.upload_file() and S3.Client.upload_fileobj()?

Tags:

python

python-3.x

amazon-web-services

amazon-s3

boto3

According to S3.Client.upload_file and S3.Client.upload_fileobj, upload_fileobj may sound faster. But does anyone know specifics? Should I just upload the file, or should I open the file in binary mode to use upload_fileobj? In other words,

import boto3

s3 = boto3.resource('s3')

### Version 1
s3.meta.client.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt')

### Version 2
with open('/tmp/hello.txt', 'rb') as data:
    s3.upload_fileobj(data, 'mybucket', 'hello.txt')

Is version 1 or version 2 better? Is there a difference?

985

asked Sep 14 '18 17:09

Flair

2 Answers

The main point with upload_fileobj is that file object doesn't have to be stored on local disk in the first place, but may be represented as file object in RAM.

Python have standard library module for that purpose.

The code will look like

import io
fo = io.BytesIO(b'my data stored as file object in RAM')
s3.upload_fileobj(fo, 'mybucket', 'hello.txt')

In that case it will perform faster, since you don't have to read from local disk.

168

answered Oct 06 '22 01:10

Samuel

TL;DR

in terms of speed, both methods will perform roughly the same, both written in python and the bottleneck will be either disk-io (read file from disk) or network-io (write to s3).

use upload_file() when writing code that only handles uploading files from disk.
use upload_fileobj() when you writing generic code to handle s3 upload that may be reused in future for not only file from disk usecase.

What is fileobj anyway?

there is convention in multiple places including the python standard library, that when one is using the term fileobj she means file-like object. There are even some libraries exposing functions that can take file path (str) or fileobj (file-like object) as the same parameter.

when using file object your code is not limited to disk only, for example:

for example you can copy data from one s3 object into another in streaming fashion (without using disk space or slowing down the process for doing read/write io to disk).
you can (de)compress or decrypt data on the fly when writing objects to S3

example using python gzip module with file-like object in generic way:

import gzip, io

def gzip_greet_file(fileobj):
    """write gzipped hello message to a file"""
    with gzip.open(filename=fileobj, mode='wb') as fp:
        fp.write(b'hello!')

# using opened file
gzip_greet_file(open('/tmp/a.gz', 'wb'))

# using filename from disk
gzip_greet_file('/tmp/b.gz')

# using io buffer
file = io.BytesIO()
gzip_greet_file(file)
file.seek(0)
print(file.getvalue())

tarfile on the other hand has two parameters file & fileobj:

tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)

Example compression on-the-fly with `s3.upload_fileobj()`

import gzip, boto3

s3 = boto3.resource('s3')


def upload_file(fileobj, bucket, key, compress=False):
    if compress:
        fileobj = gzip.GzipFile(fileobj=fileobj, mode='rb')
        key = key + '.gz'
    s3.upload_fileobj(fileobj, bucket, key)

answered Oct 05 '22 23:10

ShmulikA

Related questions
                            
                                Checking if a variable belongs to a class in python
                            
                                Can't import cv2; "DLL load failed"
                            
                                How to pass a constant value to Python UDF?
                            
                                Create a Legend on a Folium map
                            
                                Why does scipy.norm.pdf sometimes give PDF > 1? How to correct it?
                            
                                How do I install modules on qpython3 (Android port of python)
                            
                                Where does next_batch in the TensorFlow tutorial batch_xs, batch_ys = mnist.train.next_batch(100) come from?
                            
                                Create tuple of multiple items n Times in Python
                            
                                How can I change a specific row label in a Pandas dataframe?
                            
                                How to find out the accuracy?
                            
                                SSL failure on Windows using python requests
                            
                                Wrapping C++ code with python (manually)
                            
                                [Django rest framework]: Serialize a list of strings
                            
                                Appending pandas Data Frame to Google spreadsheet
                            
                                Access standardized residuals, cook's values, hatvalues (leverage) etc. easily in Python?
                            
                                Issue in using win32com to access Excel file
                            
                                Pandas unable to reset index because name exist
                            
                                Docker ENTRYPOINT with ENV variable and optional arguments
                            
                                Python Pandas: How to set the name of multiindex?
                            
                                AttributeError: module 'cv2.cv2' has no attribute 'bgsegm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between S3.Client.upload_file() and S3.Client.upload_fileobj()?

Tags:

python

python-3.x

amazon-web-services

amazon-s3

boto3

Flair

People also ask

2 Answers

Samuel

TL;DR

What is fileobj anyway?

Example compression on-the-fly with `s3.upload_fileobj()`

ShmulikA

Recent Activity

Donate For Us

What is the difference between S3.Client.upload_file() and S3.Client.upload_fileobj()?

Tags:

python

python-3.x

amazon-web-services

amazon-s3

boto3

Flair

People also ask

2 Answers

Samuel

TL;DR

What is fileobj anyway?

Example compression on-the-fly with s3.upload_fileobj()

ShmulikA

Related questions

Recent Activity

Donate For Us

Example compression on-the-fly with `s3.upload_fileobj()`