I ran into the following issue trying to vstack two large CSR matrices:
/usr/lib/python2.7/dist-packages/scipy/sparse/coo.pyc in _check(self)
229 raise ValueError('negative row index found')
230 if self.col.min() < 0:
--> 231 raise ValueError('negative column index found')
232
233 def transpose(self, copy=False):
ValueError: negative column index found
I can reproduce this error very simply by trying to convert a large lil matrix to a coo matrix. The following code works for N=10**9 but fails for N=10**10.
from scipy import sparse
from numpy import random
N=10**10
x = sparse.lil_matrix( (1,N) )
for _ in xrange(1000):
x[0,random.randint(0,N-1)]=random.randint(1,100)
y = sparse.coo_matrix(x)
Is there a size limit I am hitting for coo matrices? Is there a way around this?
Looks like you're hitting the limits of 32-bit integers. Here's a quick test:
In [14]: np.array([10**9, 10**10], dtype=np.int64)
Out[14]: array([ 1000000000, 10000000000])
In [15]: np.array([10**9, 10**10], dtype=np.int32)
Out[15]: array([1000000000, 1410065408], dtype=int32)
For now, most sparse matrix representations assume 32-bit integer indices, so they simply cannot support matrices that large.
EDIT: As of version 0.14, scipy now supports 64-bit indexing. If you can upgrade, this problem will go away.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With