I am trying to store data on my hard drive that comes in the form of 2 million symmetric 100x100 matrices. Almost all elements of these matrices are non-zero. I am currently saving this data in 200 npy files; each of which has size 5.1GB and contains 100000x100x100 numpy array. This takes up more than 1TB of hard drive space.
Is there anyway that I can use the fact that the matrices are symmetric to save space on my hard drive?
To store only the upper half of the matrix (including the diagonal) you should be able to do something like:
import numpy as np
data = np.load([filename])
flat = []
for a in data:
flat.append(a[np.triu_indices(100)])
np.savez([filename], *flat)
And then to load them back:
import numpy as np
flat = np.load([filename])
data = []
for name, a in flat:
arr = np.zeros((100,100),dtype=[dtype])
arr[np.triu_indices(100)] = a
arr = arr + arr.T - np.diag(arr.diagonal)
data.append(arr)
data = np.array(data)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With