I would like to know if there is any way to write an array as a numpy file(.npy) to an AWS S3 bucket directly. I can use np.save
to save a file locally as shown below. But I am looking for a solution to write it directly to S3, without saving locally first.
a = np.array([1, 2, 3, 4])
np.save('/my/localfolder/test1.npy', a)
To upload folders and files to an S3 bucketSign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/ . In the Buckets list, choose the name of the bucket that you want to upload your folders or files to. Choose Upload.
NPY files store all the information required to reconstruct an array on any computer, which includes dtype and shape information. NumPy is a Python programming language library that provides support for large arrays and matrices. You can export an array to an NPY file by using np. save('filename.
Navigate to All Settings > Raw Data Export > CSV Upload. Toggle the switch to ON. Select Amazon S3 Bucket from the dropdown menu. Enter your Access Key ID, Secret Access Key, and bucket name.
If you want to bypass your local disk and upload directly the data to the cloud, you may want to use pickle
instead of using a .npy
file:
import boto3
import io
import pickle
s3_client = boto3.client('s3')
my_array = numpy.random.randn(10)
# upload without using disk
my_array_data = io.BytesIO()
pickle.dump(my_array, my_array_data)
my_array_data.seek(0)
s3_client.upload_fileobj(my_array_data, 'your-bucket', 'your-file.pkl')
# download without using disk
my_array_data2 = io.BytesIO()
s3_client.download_fileobj('your-bucket', 'your-file.pkl', my_array_data2)
my_array_data2.seek(0)
my_array2 = pickle.load(my_array_data2)
# check that everything is correct
numpy.allclose(my_array, my_array2)
Documentation:
I've recently had issues with s3fs dependency conflicts with boto3, so I try to avoid using it. This solution only depends on boto3, does not write to disk, and does not explicitly use pickle.
Saving:
from io import BytesIO
import numpy as np
from urllib.parse import urlparse
import boto3
client = boto3.client("s3")
def to_s3_npy(data: np.array, s3_uri: str):
# s3_uri looks like f"s3://{BUCKET_NAME}/{KEY}"
bytes_ = BytesIO()
np.save(bytes_, data, allow_pickle=True)
bytes_.seek(0)
parsed_s3 = urlparse(s3_uri)
client.upload_fileobj(
Fileobj=bytes_, Bucket=parsed_s3.netloc, Key=parsed_s3.path[1:]
)
return True
Loading:
def from_s3_npy(s3_uri: str):
bytes_ = BytesIO()
parsed_s3 = urlparse(s3_uri)
client.download_fileobj(
Fileobj=bytes_, Bucket=parsed_s3.netloc, Key=parsed_s3.path[1:]
)
bytes_.seek(0)
return np.load(bytes_, allow_pickle=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With