Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Load npy file from S3 in python

is there anyway to load/read an external file(i.e, AWS S3) in numpy?. I have several npy files stored in S3. I have tried to access them through a S3 presigned url but it seems neither numpy.load method or np.genfromtxt are able to read them.

I wouldn't want to save files on local file system and then load them on numpy.

Any idea?

like image 320
Ivan Fernandez Avatar asked Nov 15 '16 11:11

Ivan Fernandez


2 Answers

Using s3fs

import numpy as np
from s3fs.core import S3FileSystem
s3 = S3FileSystem()

key = 'your_file.npy'
bucket = 'your_bucket'

df = np.load(s3.open('{}/{}'.format(bucket, key)))

You might have to set the allow_pickle=True depending on your file to be read.

like image 98
hru_d Avatar answered Oct 10 '22 03:10

hru_d


I've compared s3fs and io.BytesIO for loading a 28G npz file from s3. s3fs takes 30 min while io takes 12 min.

obj = s3_session.resource("s3").Object(bucket, key)
with io.BytesIO(obj.get()["Body"].read()) as f:
    f.seek(0)  # rewind the file
    X, y = np.load(f).values()
s3fs = S3FileSystem()
with s3fs.open(f"s3://{bucket}/{key}") as s3file:
     X, y = np.load(s3file).values()
like image 4
Jing Xue Avatar answered Oct 10 '22 04:10

Jing Xue