I have a set of sparse matrices filled with boolean values that I need to perform logical operations on (mostly element-wise OR). as in numpy, summing matrices with dtype='bool' gives the element-wise OR, however there's a nasty side-effect: <pre class="prettyprint"><code>>>> from scipy import sparse >>> [a,b] = [sparse.rand(5,5,density=0.1,format='lil').astype('bool') ... for x in range(2)] >>> b <5x5 sparse matrix of type '<class 'numpy.bool_'>' with 2 stored elements in LInked List format> >>> a+b <5x5 sparse matrix of type '<class 'numpy.int8'>' with 4 stored elements in Compressed Sparse Row format> </code></pre> The data type gets changed to 'int8', which causes problems for future operations. This could be gotten around with by saying: <pre class="prettyprint"><code>(a+b).astype('bool') </code></pre> But I get the impression that all this type changing would cause a performance hit. Why is the dtype of the result different from the operands? And is there a better way to do logical operations on sparse matrices in python?

Logical operations are not supported for sparse matrices, but converting back to a 'bool' is not all that expensive. Actually, if using LIL format matrices, the conversion may appear to take negative time due to performance fluctuations: <pre class="prettyprint"><code>a = scipy.sparse.rand(10000, 10000, density=0.001, format='lil').astype('bool') b = scipy.sparse.rand(10000, 10000, density=0.001, format='lil').astype('bool') In [2]: %timeit a+b 10 loops, best of 3: 61.2 ms per loop In [3]: %timeit (a+b).astype('bool') 10 loops, best of 3: 60.4 ms per loop </code></pre> You may have noticed that your LIL matrices were converted to CSR format before adding them together, look at the return format. If you had already been using CSR format to begin with, then the conversion overhead becomes more noticeable: <pre class="prettyprint"><code>In [14]: %timeit a+b 100 loops, best of 3: 2.28 ms per loop In [15]: %timeit (a+b).astype(bool) 100 loops, best of 3: 2.96 ms per loop </code></pre> CSR (and CSC) matrices have a <code>data</code> attribute which is a 1D array that holds the actual non-zero entries of the sparse matrix, so the cost of recasting your sparse matrix will depend on the number of non-zero entries of your matrix, not its size: <pre class="prettyprint"><code>a = scipy.sparse.rand(10000, 10000, density=0.0005, format='csr').astype('int8') b = scipy.sparse.rand(1000, 1000, density=0.5, format='csr').astype('int8') In [4]: %timeit a.astype('bool') # a is 10,000x10,000 with 50,000 non-zero entries 10000 loops, best of 3: 93.3 us per loop In [5]: %timeit b.astype('bool') # b is 1,000x1,000 with 500,000 non-zero entries 1000 loops, best of 3: 1.7 ms per loop </code></pre>

You can easily express Boolean operations by the following means. Then it works with sparse matrices. <pre class="prettyprint"><code>a.multiply(b) #AND a+b #OR (a>b)+(a<b) #XOR a>b #NOT </code></pre> So Boolean operations are supported.

Boolean operations on scipy.sparse matrices

Tags:

python

scipy

sparse-matrix

I have a set of sparse matrices filled with boolean values that I need to perform logical operations on (mostly element-wise OR).

as in numpy, summing matrices with dtype='bool' gives the element-wise OR, however there's a nasty side-effect:

>>> from scipy import sparse
>>> [a,b] = [sparse.rand(5,5,density=0.1,format='lil').astype('bool')
...  for x in range(2)]
>>> b
<5x5 sparse matrix of type '<class 'numpy.bool_'>'
    with 2 stored elements in LInked List format>
>>> a+b
<5x5 sparse matrix of type '<class 'numpy.int8'>'
    with 4 stored elements in Compressed Sparse Row format>

The data type gets changed to 'int8', which causes problems for future operations. This could be gotten around with by saying:

(a+b).astype('bool')

But I get the impression that all this type changing would cause a performance hit.

Why is the dtype of the result different from the operands?
And is there a better way to do logical operations on sparse matrices in python?

737

asked Jan 24 '13 21:01

TheONP

2 Answers

Logical operations are not supported for sparse matrices, but converting back to a 'bool' is not all that expensive. Actually, if using LIL format matrices, the conversion may appear to take negative time due to performance fluctuations:

a = scipy.sparse.rand(10000, 10000, density=0.001, format='lil').astype('bool')
b = scipy.sparse.rand(10000, 10000, density=0.001, format='lil').astype('bool')

In [2]: %timeit a+b
10 loops, best of 3: 61.2 ms per loop

In [3]: %timeit (a+b).astype('bool')
10 loops, best of 3: 60.4 ms per loop

You may have noticed that your LIL matrices were converted to CSR format before adding them together, look at the return format. If you had already been using CSR format to begin with, then the conversion overhead becomes more noticeable:

In [14]: %timeit a+b
100 loops, best of 3: 2.28 ms per loop

In [15]: %timeit (a+b).astype(bool)
100 loops, best of 3: 2.96 ms per loop

CSR (and CSC) matrices have a data attribute which is a 1D array that holds the actual non-zero entries of the sparse matrix, so the cost of recasting your sparse matrix will depend on the number of non-zero entries of your matrix, not its size:

a = scipy.sparse.rand(10000, 10000, density=0.0005, format='csr').astype('int8')
b = scipy.sparse.rand(1000, 1000, density=0.5, format='csr').astype('int8')

In [4]: %timeit a.astype('bool') # a is 10,000x10,000 with 50,000 non-zero entries
10000 loops, best of 3: 93.3 us per loop

In [5]: %timeit b.astype('bool') # b is 1,000x1,000 with 500,000 non-zero entries
1000 loops, best of 3: 1.7 ms per loop

176

answered Oct 18 '22 15:10

Jaime

You can easily express Boolean operations by the following means. Then it works with sparse matrices.

a.multiply(b) #AND
a+b           #OR
(a>b)+(a<b)   #XOR
a>b           #NOT

So Boolean operations are supported.

answered Oct 18 '22 15:10

Radio Controlled

Related questions
                            
                                Listen to USB keyboard with Python
                            
                                Pythonic way to test if a row is in an array
                            
                                Python's imp.reload() function is not working?
                            
                                C# to Python converter [closed]
                            
                                paramiko.SSHException: Error reading SSH protocol banner
                            
                                Django: using values() and get_FOO_display()?
                            
                                Python - Controlling Tor
                            
                                How do you make an installer for your python program
                            
                                Python ValueError: not allowed to raise maximum limit
                            
                                matplotlib animated plot wont update labels on axis using blit
                            
                                Is there a pythonic way to support keyword arguments for a memoize decorator in Python?
                            
                                scipy: basic clarifications
                            
                                Why is numpy.ravel returning a copy?
                            
                                PyDev Offline install
                            
                                Click on a dropdown element menu with Selenium Webdriver
                            
                                Why are the Tkinter canvas lines jagged?
                            
                                Django website, basic 2d python game
                            
                                Why does import error change to "cannot import name" on the second import?
                            
                                vim-jedi autocomplete not working
                            
                                Using matplotlib in GAE

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With