how to write .npy file to s3 directly?

Tags:

I would like to know if there is any way to write an array as a numpy file(.npy) to an AWS S3 bucket directly. I can use np.save to save a file locally as shown below. But I am looking for a solution to write it directly to S3, without saving locally first.

a = np.array([1, 2, 3, 4])
np.save('/my/localfolder/test1.npy', a)

595

asked Jan 01 '18 12:01

user121

2 Answers

If you want to bypass your local disk and upload directly the data to the cloud, you may want to use pickle instead of using a .npy file:

import boto3
import io
import pickle

s3_client = boto3.client('s3')

my_array = numpy.random.randn(10)

# upload without using disk
my_array_data = io.BytesIO()
pickle.dump(my_array, my_array_data)
my_array_data.seek(0)
s3_client.upload_fileobj(my_array_data, 'your-bucket', 'your-file.pkl')

# download without using disk
my_array_data2 = io.BytesIO()
s3_client.download_fileobj('your-bucket', 'your-file.pkl', my_array_data2)
my_array_data2.seek(0)
my_array2 = pickle.load(my_array_data2)

# check that everything is correct
numpy.allclose(my_array, my_array2)

Documentation:

boto3
pickle
BytesIO

152

answered Sep 20 '22 09:09

M1L0U

I've recently had issues with s3fs dependency conflicts with boto3, so I try to avoid using it. This solution only depends on boto3, does not write to disk, and does not explicitly use pickle.

Saving:

from io import BytesIO
import numpy as np
from urllib.parse import urlparse
import boto3
client = boto3.client("s3")

def to_s3_npy(data: np.array, s3_uri: str):
    # s3_uri looks like f"s3://{BUCKET_NAME}/{KEY}"
    bytes_ = BytesIO()
    np.save(bytes_, data, allow_pickle=True)
    bytes_.seek(0)
    parsed_s3 = urlparse(s3_uri)
    client.upload_fileobj(
        Fileobj=bytes_, Bucket=parsed_s3.netloc, Key=parsed_s3.path[1:]
    )
    return True

Loading:

def from_s3_npy(s3_uri: str):
    bytes_ = BytesIO()
    parsed_s3 = urlparse(s3_uri)
    client.download_fileobj(
        Fileobj=bytes_, Bucket=parsed_s3.netloc, Key=parsed_s3.path[1:]
    )
    bytes_.seek(0)
    return np.load(bytes_, allow_pickle=True)

answered Sep 20 '22 09:09

Wesley Cheek

Related questions
                            
                                detect rectangle in image and crop
                            
                                Sorting TfidfVectorizer output by tf-idf (lowest to highest and vice versa)
                            
                                Prevent DateRangeField overlap in Django model?
                            
                                Timedelta object cannot be converted with astype()
                            
                                asyncio loop's add_signal_handler() in Windows
                            
                                plotly: huge number of datapoints
                            
                                In macOS Sierra, How Configure AWS CLI to Use Python3.x Instead of the OS Default Python2.7?
                            
                                Check if a tkinter widget is visible
                            
                                Numpy: An efficient way to merge multiple slices [duplicate]
                            
                                Clear QLineEdit on click event
                            
                                Why is the endian reversed after sending over TCP
                            
                                Multiple plotly plots on 1 page without subplot
                            
                                How to visualize kmeans clustering on multidimensional data
                            
                                django-auth-ldap installation not working
                            
                                Mean Std in pandas data frame
                            
                                Checking if two arrays are broadcastable in python
                            
                                How to plot using matplotlib (python) colah's deformed grid?
                            
                                How to have predictions AND labels returned with tf.estimator (either with predict or eval method)?
                            
                                Draw line between two given points (OpenCV, Python)
                            
                                Plotting a 2D plane through a 3D surface

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to write .npy file to s3 directly?

Tags:

python

amazon-s3

numpy

user121

People also ask

2 Answers

M1L0U

Wesley Cheek

Recent Activity

Donate For Us