Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between S3.Client.upload_file() and S3.Client.upload_fileobj()?

According to S3.Client.upload_file and S3.Client.upload_fileobj, upload_fileobj may sound faster. But does anyone know specifics? Should I just upload the file, or should I open the file in binary mode to use upload_fileobj? In other words,

import boto3

s3 = boto3.resource('s3')

### Version 1
s3.meta.client.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt')

### Version 2
with open('/tmp/hello.txt', 'rb') as data:
    s3.upload_fileobj(data, 'mybucket', 'hello.txt')

Is version 1 or version 2 better? Is there a difference?

like image 985
Flair Avatar asked Sep 14 '18 17:09

Flair


People also ask

What is Boto3 client (' S3 ')?

​Boto3 is the official AWS SDK for Python, used to create, configure, and manage AWS services. The following are examples of defining a resource/client in boto3 for the Weka S3 service, managing credentials, and pre-signed URLs, generating secure temporary tokens, and using those to run S3 API calls.

Does Put_object overwrite?

put_object` does not overwrite the existing data in the bucket.


2 Answers

The main point with upload_fileobj is that file object doesn't have to be stored on local disk in the first place, but may be represented as file object in RAM.

Python have standard library module for that purpose.

The code will look like

import io
fo = io.BytesIO(b'my data stored as file object in RAM')
s3.upload_fileobj(fo, 'mybucket', 'hello.txt')

In that case it will perform faster, since you don't have to read from local disk.

like image 168
Samuel Avatar answered Oct 06 '22 01:10

Samuel


TL;DR

in terms of speed, both methods will perform roughly the same, both written in python and the bottleneck will be either disk-io (read file from disk) or network-io (write to s3).

  • use upload_file() when writing code that only handles uploading files from disk.
  • use upload_fileobj() when you writing generic code to handle s3 upload that may be reused in future for not only file from disk usecase.


What is fileobj anyway?

there is convention in multiple places including the python standard library, that when one is using the term fileobj she means file-like object. There are even some libraries exposing functions that can take file path (str) or fileobj (file-like object) as the same parameter.

when using file object your code is not limited to disk only, for example:

  1. for example you can copy data from one s3 object into another in streaming fashion (without using disk space or slowing down the process for doing read/write io to disk).

  2. you can (de)compress or decrypt data on the fly when writing objects to S3

example using python gzip module with file-like object in generic way:

import gzip, io

def gzip_greet_file(fileobj):
    """write gzipped hello message to a file"""
    with gzip.open(filename=fileobj, mode='wb') as fp:
        fp.write(b'hello!')

# using opened file
gzip_greet_file(open('/tmp/a.gz', 'wb'))

# using filename from disk
gzip_greet_file('/tmp/b.gz')

# using io buffer
file = io.BytesIO()
gzip_greet_file(file)
file.seek(0)
print(file.getvalue())

tarfile on the other hand has two parameters file & fileobj:

tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)


Example compression on-the-fly with s3.upload_fileobj()

import gzip, boto3

s3 = boto3.resource('s3')


def upload_file(fileobj, bucket, key, compress=False):
    if compress:
        fileobj = gzip.GzipFile(fileobj=fileobj, mode='rb')
        key = key + '.gz'
    s3.upload_fileobj(fileobj, bucket, key)
like image 39
ShmulikA Avatar answered Oct 05 '22 23:10

ShmulikA