Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error converting large sparse matrix to COO

I ran into the following issue trying to vstack two large CSR matrices:

    /usr/lib/python2.7/dist-packages/scipy/sparse/coo.pyc in _check(self)
    229                 raise ValueError('negative row index found')
    230             if self.col.min() < 0:
--> 231                 raise ValueError('negative column index found')
    232
    233     def transpose(self, copy=False):

ValueError: negative column index found

I can reproduce this error very simply by trying to convert a large lil matrix to a coo matrix. The following code works for N=10**9 but fails for N=10**10.

from scipy import sparse
from numpy import random
N=10**10
x = sparse.lil_matrix( (1,N) )
for _ in xrange(1000):
    x[0,random.randint(0,N-1)]=random.randint(1,100)

y = sparse.coo_matrix(x)

Is there a size limit I am hitting for coo matrices? Is there a way around this?

like image 684
canzar Avatar asked Jul 17 '14 20:07

canzar


1 Answers

Looks like you're hitting the limits of 32-bit integers. Here's a quick test:

In [14]: np.array([10**9, 10**10], dtype=np.int64)
Out[14]: array([ 1000000000, 10000000000])

In [15]: np.array([10**9, 10**10], dtype=np.int32)
Out[15]: array([1000000000, 1410065408], dtype=int32)

For now, most sparse matrix representations assume 32-bit integer indices, so they simply cannot support matrices that large.

EDIT: As of version 0.14, scipy now supports 64-bit indexing. If you can upgrade, this problem will go away.

like image 99
perimosocordiae Avatar answered Oct 12 '22 14:10

perimosocordiae