Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use numpy file without importing into RAM?

I want to use an numpy file (.npy) from Google Drive into Google Colab without importing it into the RAM.

I am working on Image Classification and have my image data into four numpy files in Google Drive. The collective size of the files is greater than 14 GB. Whereas Google Colab only offers 12 GB RAM for usage. Is there a way through which I can use it by loading only single batch at a time into the ram to train the model and removing it from the ram (maybe similar to flow_from_directory)?

The problem using flow_from_directory is that it is very slow even for one block of VGG16 even if I have images in Colab directory.

I am using Cats vs Dogs Classifier dataset from Kaggle.

! kaggle competitions download -c 'dogs-vs-cats'

I converted the image data into numpy array, and saved it in 4 files:

X_train - float32 - 10.62GB - (18941, 224, 224, 3)

X_test - float32 - 3.4GB - (6059, 224, 224, 3)

Y_train - float64 - 148KB - (18941)

Y_test - float64 - 47KB - (6059)

When I run the following code, the session crashes showing 'Your session crashed after using all available RAM.' error.

import numpy as np
X_train = np.load('Cat_Dog_Classifier/X_train.npy')
Y_train = np.load('Cat_Dog_Classifier/Y_train.npy')
X_test = np.load('Cat_Dog_Classifier/X_test.npy')
Y_test = np.load('Cat_Dog_Classifier/Y_test.npy')

Is there any way to use these 4 files without loading it into the RAM?

like image 649
Rahul Vishwakarma Avatar asked Oct 28 '25 10:10

Rahul Vishwakarma


1 Answers

You can do this by opening your file as a memory-mapped array.

For example:

import sys
import numpy as np

# Create a npy file
x = np.random.rand(1000, 1000)
np.save('mydata.npy', x)

# Load as a normal array
y = np.load('mydata.npy')
sys.getsizeof(y)
# 8000112

# Load as a memory-mapped array
y = np.load('mydata.npy', mmap_mode='r')
sys.getsizeof(y)
# 136

The second array acts like a normal array, but is backed by disk rather than RAM. Be aware that this will cause operations over the arrays to be much slower than normal RAM-backed arrays; often mem-mapping is used to conveniently access portions of the array without having to load the full array into RAM.

like image 80
jakevdp Avatar answered Oct 30 '25 01:10

jakevdp