Logo Questions Linux Laravel Mysql Ubuntu Git Menu

how to write .npy file to s3 directly?

I would like to know if there is any way to write an array as a numpy file(.npy) to an AWS S3 bucket directly. I can use np.save to save a file locally as shown below. But I am looking for a solution to write it directly to S3, without saving locally first.

a = np.array([1, 2, 3, 4])
np.save('/my/localfolder/test1.npy', a)
like image 595
user121 Avatar asked Jan 01 '18 12:01


People also ask

How do I upload a file to S3?

To upload folders and files to an S3 bucketSign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/ . In the Buckets list, choose the name of the bucket that you want to upload your folders or files to. Choose Upload.

How do I use .NPY files?

NPY files store all the information required to reconstruct an array on any computer, which includes dtype and shape information. NumPy is a Python programming language library that provides support for large arrays and matrices. You can export an array to an NPY file by using np. save('filename.

How do I import a CSV file into an S3 bucket?

Navigate to All Settings > Raw Data Export > CSV Upload. Toggle the switch to ON. Select Amazon S3 Bucket from the dropdown menu. Enter your Access Key ID, Secret Access Key, and bucket name.

2 Answers

If you want to bypass your local disk and upload directly the data to the cloud, you may want to use pickle instead of using a .npy file:

import boto3
import io
import pickle

s3_client = boto3.client('s3')

my_array = numpy.random.randn(10)

# upload without using disk
my_array_data = io.BytesIO()
pickle.dump(my_array, my_array_data)
s3_client.upload_fileobj(my_array_data, 'your-bucket', 'your-file.pkl')

# download without using disk
my_array_data2 = io.BytesIO()
s3_client.download_fileobj('your-bucket', 'your-file.pkl', my_array_data2)
my_array2 = pickle.load(my_array_data2)

# check that everything is correct
numpy.allclose(my_array, my_array2)


  • boto3
  • pickle
  • BytesIO
like image 152
M1L0U Avatar answered Sep 20 '22 09:09


I've recently had issues with s3fs dependency conflicts with boto3, so I try to avoid using it. This solution only depends on boto3, does not write to disk, and does not explicitly use pickle.


from io import BytesIO
import numpy as np
from urllib.parse import urlparse
import boto3
client = boto3.client("s3")

def to_s3_npy(data: np.array, s3_uri: str):
    # s3_uri looks like f"s3://{BUCKET_NAME}/{KEY}"
    bytes_ = BytesIO()
    np.save(bytes_, data, allow_pickle=True)
    parsed_s3 = urlparse(s3_uri)
        Fileobj=bytes_, Bucket=parsed_s3.netloc, Key=parsed_s3.path[1:]
    return True


def from_s3_npy(s3_uri: str):
    bytes_ = BytesIO()
    parsed_s3 = urlparse(s3_uri)
        Fileobj=bytes_, Bucket=parsed_s3.netloc, Key=parsed_s3.path[1:]
    return np.load(bytes_, allow_pickle=True)
like image 34
Wesley Cheek Avatar answered Sep 20 '22 09:09

Wesley Cheek