I want to build an extremely large sparse matrix incrementally. The problem is that lil_matrix
is taking so much RAM that it becomes inefficient. For example, if I want to create a 20million x 20million lil_matrix
, it will blow my RAM completely. On the other hand, csr_matrix
barely takes space. However, csr_matrix
is allegedly inefficient for modifications. Is there any other way to get the benefit of lil_matrix
without taking so much space in RAM? Also, why is it taking so much space in the first place, since it is supposed to be a sparse matrix?
Note: The real problem is actually not creating such a big matrix, but instead creating the following list
:
list = [sp.lil_matrix((150,150)) for i in range(1000)]
which also blows up my RAM.
I don't claim to have a full answer, but I fell like you will get there if you look at the matrix internals.
In [12]: s = sparse.csr_matrix((5,5))
In [13]: s.__dict__
Out[13]:
{'_shape': (5, 5),
'data': array([], dtype=float64),
'format': 'csr',
'indices': array([], dtype=int32),
'indptr': array([0, 0, 0, 0, 0, 0], dtype=int32),
'maxprint': 50}
In [14]: s.indptr.nbytes
Out[14]: 24
In [15]: l = sparse.lil_matrix((5,5))
In [16]: l.__dict__
Out[16]:
{'_shape': (5, 5),
'data': array([[], [], [], [], []], dtype=object),
'dtype': dtype('float64'),
'format': 'lil',
'maxprint': 50,
'rows': array([[], [], [], [], []], dtype=object)}
In [17]: l.data.nbytes
Out[17]: 40
In [18]: l.rows.nbytes
Out[18]: 40
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With