Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: lil_matrix vs csr_matrix in extremely large sparse matrices

I want to build an extremely large sparse matrix incrementally. The problem is that lil_matrix is taking so much RAM that it becomes inefficient. For example, if I want to create a 20million x 20million lil_matrix, it will blow my RAM completely. On the other hand, csr_matrix barely takes space. However, csr_matrix is allegedly inefficient for modifications. Is there any other way to get the benefit of lil_matrix without taking so much space in RAM? Also, why is it taking so much space in the first place, since it is supposed to be a sparse matrix?

Note: The real problem is actually not creating such a big matrix, but instead creating the following list:

list = [sp.lil_matrix((150,150)) for i in range(1000)]

which also blows up my RAM.

like image 588
Michael Avatar asked Sep 27 '22 16:09

Michael


1 Answers

I don't claim to have a full answer, but I fell like you will get there if you look at the matrix internals.

In [12]: s = sparse.csr_matrix((5,5))

In [13]: s.__dict__
Out[13]: 
{'_shape': (5, 5),
 'data': array([], dtype=float64),
 'format': 'csr',
 'indices': array([], dtype=int32),
 'indptr': array([0, 0, 0, 0, 0, 0], dtype=int32),
 'maxprint': 50}

In [14]: s.indptr.nbytes
Out[14]: 24

In [15]: l = sparse.lil_matrix((5,5))

In [16]: l.__dict__
Out[16]: 
{'_shape': (5, 5),
 'data': array([[], [], [], [], []], dtype=object),
 'dtype': dtype('float64'),
 'format': 'lil',
 'maxprint': 50,
 'rows': array([[], [], [], [], []], dtype=object)}

In [17]: l.data.nbytes
Out[17]: 40

In [18]: l.rows.nbytes
Out[18]: 40
like image 74
Akavall Avatar answered Sep 30 '22 07:09

Akavall