Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Save 3d numpy array with high speed into the Disk

I have a numpy array of size (192,192,4000) I would like to write this in a fast way on the disk. I don't care about the format, I can convert it afterwards.

What I do write now is that I save it in csv format which takes long time:

for i in range(0,192):
        np.savetxt(foder+"/{}_{}.csv".format(filename,i), data[i] , "%i", delimiter=", ")

Which takes 20-25 seconds. I tried pandas DataFrame and Panel approaches found in stackoverflow questions already and numpy save. All of them seems to run without error but the folder is empty when I open it.

Any idea how to improve the speed?

Why code runs without error but nothing is saved, for example for numpy.save?!

like image 429
Farnaz Avatar asked Dec 15 '25 03:12

Farnaz


2 Answers

Usually the fastest way to save a large array like the one you have is to save it as a binary file, which can be done by numpy's save command. For example, the following creates a 3D array filled with zeroes, writes the array to a file and then retrieves it:

a = numpy.zeros((192,192,4000))
numpy.save("mydata.npy",a)
b = numpy.load("mydata.npy")

Of course, the file "mydata.npy" should be there in the present directory after the save command.

like image 131
Rajesh Venkatesan Avatar answered Dec 16 '25 16:12

Rajesh Venkatesan


You can also reshape your array from 3D to 2D before saving. See the following code for an example.

import numpy as gfg 


arr = gfg.random.rand(5, 4, 3) 

# reshaping the array from 3D 
# matrice to 2D matrice. 
arr_reshaped = arr.reshape(arr.shape[0], -1) 

# saving reshaped array to file. 
gfg.savetxt("geekfile.txt", arr_reshaped) 

# retrieving data from file. 
loaded_arr = gfg.loadtxt("geekfile.txt") 

# This loadedArr is a 2D array, therefore 
# we need to convert it to the original 
# array shape.reshaping to get original 
# matrice with original shape. 
load_original_arr = loaded_arr.reshape( 
    loaded_arr.shape[0], loaded_arr.shape[1] // arr.shape[2], arr.shape[2]) 

# check the shapes: 
print("shape of arr: ", arr.shape) 
print("shape of load_original_arr: ", load_original_arr.shape) 

# check if both arrays are same or not: 
if (load_original_arr == arr).all(): 
    print("Yes, both the arrays are same") 
else: 
    print("No, both the arrays are not same") 
like image 36
liedji Avatar answered Dec 16 '25 18:12

liedji



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!