Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compound assignment operators in Python's Numpy library

Tags:

python

numpy

The "vectorizing" of fancy indexing by Python's numpy library sometimes gives unexpected results. For example:

import numpy
a = numpy.zeros((1000,4), dtype='uint32')
b = numpy.zeros((1000,4), dtype='uint32')
i = numpy.random.random_integers(0,999,1000)
j = numpy.random.random_integers(0,3,1000)

a[i,j] += 1
for k in xrange(1000):
    b[i[k],j[k]] += 1

Gives different results in the arrays 'a' and 'b' (i.e. the appearance of tuple (i,j) appears as 1 in 'a' regardless of repeats, whereas repeats are counted in 'b'). This is easily verified as follows:

numpy.sum(a)
883
numpy.sum(b)
1000

It is also notable that the fancy indexing version is almost two orders of magnitude faster than the for loop. My question is: "Is there an efficient way for numpy to compute the repeat counts as implemented using the for loop in the provided example?"

like image 560
user1451766 Avatar asked Jun 12 '12 16:06

user1451766


People also ask

What is compound assignment operator in Python?

The compound assignment operator performs the operation of the binary operator on both operands, and stores the result of that operation in the left operand (must be a modifiable value). The above expression is equivalent to a = a+10 . In the same way, we can subtract, multiply, and divide a number.

What is operator in Python NumPy?

NumPy performs operations element-by-element, so multiplying 2D arrays with * is not a matrix multiplication – it's an element-by-element multiplication. (The @ operator, available since Python 3.5, can be used for conventional matrix multiplication.)


1 Answers

This should do what you want:

np.bincount(np.ravel_multi_index((i, j), (1000, 4)), minlength=4000).reshape(1000, 4)

As a breakdown, ravel_multi_index converts the index pairs specified by i and j to integer indices into a C-flattened array; bincount counts the number of times each value 0..4000 appears in that list of indices; and reshape converts the C-flattened array back to a 2d array.

In terms of performance, I measure it at 200 times faster than "b", and 5 times faster than "a"; your mileage may vary.

Since you need to write the counts to an existing array a, try this:

u, inv = np.unique(np.ravel_multi_index((i, j), (1000, 4)), return_inverse=True)
a.flat[u] += np.bincount(inv)

I make this second method a little slower (2x) than "a", which isn't too surprising as the unique stage is going to be slow.

like image 188
ecatmur Avatar answered Oct 21 '22 03:10

ecatmur