Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python/Numpy: Efficiently store non-sparse large symmetric arrays?

I am trying to store data on my hard drive that comes in the form of 2 million symmetric 100x100 matrices. Almost all elements of these matrices are non-zero. I am currently saving this data in 200 npy files; each of which has size 5.1GB and contains 100000x100x100 numpy array. This takes up more than 1TB of hard drive space.

Is there anyway that I can use the fact that the matrices are symmetric to save space on my hard drive?

like image 800
Alice Schwarze Avatar asked Oct 17 '22 00:10

Alice Schwarze


1 Answers

To store only the upper half of the matrix (including the diagonal) you should be able to do something like:

import numpy as np

data = np.load([filename])

flat = []
for a in data:
    flat.append(a[np.triu_indices(100)])

np.savez([filename], *flat)

And then to load them back:

import numpy as np

flat = np.load([filename])

data = []

for name, a in flat:
    arr = np.zeros((100,100),dtype=[dtype])
    arr[np.triu_indices(100)] = a
    arr = arr + arr.T - np.diag(arr.diagonal)
    data.append(arr)

data = np.array(data)
like image 75
user545424 Avatar answered Oct 21 '22 05:10

user545424