Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to write .npy file to s3 directly?

I would like to know if there is any way to write an array as a numpy file(.npy) to an AWS S3 bucket directly. I can use np.save to save a file locally as shown below. But I am looking for a solution to write it directly to S3, without saving locally first.

a = np.array([1, 2, 3, 4])
np.save('/my/localfolder/test1.npy', a)
like image 595
user121 Avatar asked Jan 01 '18 12:01

user121


People also ask

How do I upload a file to S3?

To upload folders and files to an S3 bucketSign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/ . In the Buckets list, choose the name of the bucket that you want to upload your folders or files to. Choose Upload.

How do I use .NPY files?

NPY files store all the information required to reconstruct an array on any computer, which includes dtype and shape information. NumPy is a Python programming language library that provides support for large arrays and matrices. You can export an array to an NPY file by using np. save('filename.

How do I import a CSV file into an S3 bucket?

Navigate to All Settings > Raw Data Export > CSV Upload. Toggle the switch to ON. Select Amazon S3 Bucket from the dropdown menu. Enter your Access Key ID, Secret Access Key, and bucket name.


2 Answers

If you want to bypass your local disk and upload directly the data to the cloud, you may want to use pickle instead of using a .npy file:

import boto3
import io
import pickle

s3_client = boto3.client('s3')

my_array = numpy.random.randn(10)

# upload without using disk
my_array_data = io.BytesIO()
pickle.dump(my_array, my_array_data)
my_array_data.seek(0)
s3_client.upload_fileobj(my_array_data, 'your-bucket', 'your-file.pkl')

# download without using disk
my_array_data2 = io.BytesIO()
s3_client.download_fileobj('your-bucket', 'your-file.pkl', my_array_data2)
my_array_data2.seek(0)
my_array2 = pickle.load(my_array_data2)

# check that everything is correct
numpy.allclose(my_array, my_array2)

Documentation:

  • boto3
  • pickle
  • BytesIO
like image 152
M1L0U Avatar answered Sep 20 '22 09:09

M1L0U


I've recently had issues with s3fs dependency conflicts with boto3, so I try to avoid using it. This solution only depends on boto3, does not write to disk, and does not explicitly use pickle.

Saving:

from io import BytesIO
import numpy as np
from urllib.parse import urlparse
import boto3
client = boto3.client("s3")

def to_s3_npy(data: np.array, s3_uri: str):
    # s3_uri looks like f"s3://{BUCKET_NAME}/{KEY}"
    bytes_ = BytesIO()
    np.save(bytes_, data, allow_pickle=True)
    bytes_.seek(0)
    parsed_s3 = urlparse(s3_uri)
    client.upload_fileobj(
        Fileobj=bytes_, Bucket=parsed_s3.netloc, Key=parsed_s3.path[1:]
    )
    return True

Loading:

def from_s3_npy(s3_uri: str):
    bytes_ = BytesIO()
    parsed_s3 = urlparse(s3_uri)
    client.download_fileobj(
        Fileobj=bytes_, Bucket=parsed_s3.netloc, Key=parsed_s3.path[1:]
    )
    bytes_.seek(0)
    return np.load(bytes_, allow_pickle=True)
like image 34
Wesley Cheek Avatar answered Sep 20 '22 09:09

Wesley Cheek