I am trying to make a GET request to a server to retrieve a tiff image. I then want to stream it directly to MinIO using the put_object method in the MinIO python SDK. I know I could do this by saving the image to a temp file, then uploading but I wanted to see if I could skip that step. I've tried inserting the byte response directly and using BytesIO to wrap it but I think I am missing something. <pre class="prettyprint lang-py prettyprint-override"><code>r = requests.get(url_to_download, stream=True) Minio_client.put_object("bucket_name", "stream_test.tiff", r.content, r.headers['Content-length']) </code></pre> I get back the following error <blockquote> AttributeError: 'bytes' object has no attribute 'read' </blockquote> Any help is much appreciated!

Reading documentation on MinIO about <code>put_object</code>, there are examples how to add a new object to the object storage server. Those examples only explain how to add a file. This is definition of <code>put_object</code> function: <code>put_object(bucket_name, object_name, data, length, content_type='application/octet-stream', metadata=None, progress=None, part_size=510241024)</code> We are interested in <code>data</code> parameter. It states: <blockquote> Any python object implementing io.RawIOBase. </blockquote> RawIOBase is base class for raw binary I/O. It also defines method <code>read</code>. If we were to use dir() built-in function to attempt to return a list of valid attributes for <code>r.content</code>, we could then check if <code>read</code> is there: <code>'read' in dir(r.content)</code> -> return <code>False</code> That's the reason why you get <code>AttributeError: 'bytes' object has no attribute 'read'</code>. It's because <code>type(r.content)</code> is <code>bytes</code> class. <hr> You can convert <code>r.content</code> into class that inherits from <code>RawIOBase</code>. That is, using <code>io.BytesIO</code> class. To get size of an object in bytes, we could use <code>io.BytesIO(r.content).getbuffer().nbytes</code>. So if you want to stream raw bytes of data to your bucket, convert <code>bytes</code> class to <code>io.BytesIO</code> class: <pre class="prettyprint"><code>import io import requests r = requests.get(url_to_download, stream=True) raw_img = io.BytesIO(r.content) raw_img_size = raw_img.getbuffer().nbytes Minio_client.put_object("bucket_name", "stream_test.tiff", raw_img, raw_img_size) </code></pre> <hr> NOTE: Examples show reading binary data from file and getting its size by reading <code>st_size</code> attribute from <code>stat_result</code> which is returned by using <code>os.stat()</code> function. <code>st_size</code> is equivalent of to <code>io.BytesIO(r.content).getbuffer().nbytes</code>.

You can stream your file directly into a minio bucket like this: <pre class="prettyprint lang-py prettyprint-override"><code>import requests from pathlib import Path from urllib.parse import urlparse from django.conf import settings from django.core.files.storage import default_storage client = default_storage.client object_name = Path(urlparse(response.url).path).name bucket_name = settings.MINIO_STORAGE_MEDIA_BUCKET_NAME with requests.get(url_to_download, stream=True) as r: content_length = int(r.headers["Content-Length"]) result = client.put_object(bucket_name, object_name, r.raw, content_length) </code></pre> Or you can use a django file field directly: <pre class="prettyprint lang-py prettyprint-override"><code>with requests.get(url_to_download, stream=True) as r: # patch the stream to make django-minio-storage belief # it's about to read from a legit file r.raw.seek = lambda x: 0 r.raw.size = int(r.headers["Content-Length"]) model = MyModel() model.file.save(object_name, r.raw, save=True) </code></pre> The RawIOBase hint from Dinko Pehar was really helpful, thanks a lot. But you have to use response.raw not response.content which would download your file immediately and be really inconvenient when trying to store a large video for example.

Is there a way to stream data directly from python request to minio bucket

I am trying to make a GET request to a server to retrieve a tiff image. I then want to stream it directly to MinIO using the put_object method in the MinIO python SDK.

I know I could do this by saving the image to a temp file, then uploading but I wanted to see if I could skip that step.

I've tried inserting the byte response directly and using BytesIO to wrap it but I think I am missing something.

r = requests.get(url_to_download, stream=True)
Minio_client.put_object("bucket_name", "stream_test.tiff", r.content, r.headers['Content-length'])

I get back the following error

AttributeError: 'bytes' object has no attribute 'read'

Any help is much appreciated!

How do I access MinIO data?

Point a web browser running on the host machine to http://127.0.0.1:9000 and log in with the root credentials. You can use the Browser to create buckets, upload objects, and browse the contents of the MinIO server. You can also connect using any S3-compatible tool, such as the MinIO Client mc commandline tool.

What is Python MinIO?

MinIO Python SDK is Simple Storage Service (aka S3) client to perform bucket and object operations to any Amazon S3 compatible object storage service. For a complete list of APIs and examples, please take a look at the Python Client API Reference.

What is bucket in MinIO?

MinIO Object Storage uses buckets to organize objects. A bucket is similar to a folder or directory in a filesystem, where each bucket can hold an arbitrary number of objects. MinIO buckets provide the same functionality as AWS S3 buckets. For example, consider an application that hosts a web blog.

Reading documentation on MinIO about put_object, there are examples how to add a new object to the object storage server. Those examples only explain how to add a file.

This is definition of put_object function:

put_object(bucket_name, object_name, data, length, content_type='application/octet-stream', metadata=None, progress=None, part_size=510241024)

We are interested in data parameter. It states:

Any python object implementing io.RawIOBase.

RawIOBase is base class for raw binary I/O. It also defines method read.

If we were to use dir() built-in function to attempt to return a list of valid attributes for r.content, we could then check if read is there:

'read' in dir(r.content) -> return False

That's the reason why you get AttributeError: 'bytes' object has no attribute 'read'. It's because type(r.content) is bytes class.

You can convert r.content into class that inherits from RawIOBase. That is, using io.BytesIO class. To get size of an object in bytes, we could use io.BytesIO(r.content).getbuffer().nbytes.

So if you want to stream raw bytes of data to your bucket, convert bytes class to io.BytesIO class:

import io
import requests

r = requests.get(url_to_download, stream=True)
raw_img = io.BytesIO(r.content)
raw_img_size = raw_img.getbuffer().nbytes

Minio_client.put_object("bucket_name", "stream_test.tiff", raw_img, raw_img_size)

NOTE: Examples show reading binary data from file and getting its size by reading st_size attribute from stat_result which is returned by using os.stat() function.

st_size is equivalent of to io.BytesIO(r.content).getbuffer().nbytes.

You can stream your file directly into a minio bucket like this:

import requests

from pathlib import Path
from urllib.parse import urlparse

from django.conf import settings
from django.core.files.storage import default_storage

client = default_storage.client
object_name = Path(urlparse(response.url).path).name
bucket_name = settings.MINIO_STORAGE_MEDIA_BUCKET_NAME

with requests.get(url_to_download, stream=True) as r:
    content_length = int(r.headers["Content-Length"])
    result = client.put_object(bucket_name, object_name, r.raw, content_length)

Or you can use a django file field directly:

with requests.get(url_to_download, stream=True) as r:
    # patch the stream to make django-minio-storage belief
    # it's about to read from a legit file
    r.raw.seek = lambda x: 0
    r.raw.size = int(r.headers["Content-Length"])
    model = MyModel()
    model.file.save(object_name, r.raw, save=True)

The RawIOBase hint from Dinko Pehar was really helpful, thanks a lot. But you have to use response.raw not response.content which would download your file immediately and be really inconvenient when trying to store a large video for example.

Is there a way to stream data directly from python request to minio bucket

Tags:

python

stream

python-3.x

python-requests

minio

Judson Crouch

People also ask

2 Answers

Dinko Pehar

ephes

Recent Activity

Donate For Us

Is there a way to stream data directly from python request to minio bucket

Tags:

python

stream

python-3.x

python-requests

minio

Judson Crouch

People also ask

2 Answers

Dinko Pehar

ephes

Related questions

Recent Activity

Donate For Us