I have unsorted array of indexes: <pre class="prettyprint"><code>i = np.array([1,5,2,6,4,3,6,7,4,3,2]) </code></pre> I also have an array of values of the same length: <pre class="prettyprint"><code>v = np.array([2,5,2,3,4,1,2,1,6,4,2]) </code></pre> I have array with zeros of desired values: <pre class="prettyprint"><code>d = np.zeros(10) </code></pre> Now I want to add to elements in d values of v based on it's index in i. If I do it in plain python I would do it like this: <pre class="prettyprint"><code>for index,value in enumerate(v): idx = i[index] d[idx] += v[index] </code></pre> It is ugly and inefficient. How can I change it?

We can use <code>np.bincount</code> which is supposedly pretty efficient for such accumulative weighted counting, so here's one with that - <pre class="prettyprint"><code>counts = np.bincount(i,v) d[:counts.size] = counts </code></pre> Alternatively, using <code>minlength</code> input argument and for a generic case when <code>d</code> could be any array and we want to add into it - <pre class="prettyprint"><code>d += np.bincount(i,v,minlength=d.size).astype(d.dtype, copy=False) </code></pre> Runtime tests This section compares <code>np.add.at</code> based approach listed in the <code>other post</code> with the <code>np.bincount</code> based one listed earlier in this post. <pre class="prettyprint"><code>In [61]: def bincount_based(d,i,v): ...: counts = np.bincount(i,v) ...: d[:counts.size] = counts ...: ...: def add_at_based(d,i,v): ...: np.add.at(d, i, v) ...: In [62]: # Inputs (random numbers) ...: N = 10000 ...: i = np.random.randint(0,1000,(N)) ...: v = np.random.randint(0,1000,(N)) ...: ...: # Setup output arrays for two approaches ...: M = 12000 ...: d1 = np.zeros(M) ...: d2 = np.zeros(M) ...: In [63]: bincount_based(d1,i,v) # Run approaches ...: add_at_based(d2,i,v) ...: In [64]: np.allclose(d1,d2) # Verify outputs Out[64]: True In [67]: # Setup output arrays for two approaches again for timing ...: M = 12000 ...: d1 = np.zeros(M) ...: d2 = np.zeros(M) ...: In [68]: %timeit add_at_based(d2,i,v) 1000 loops, best of 3: 1.83 ms per loop In [69]: %timeit bincount_based(d1,i,v) 10000 loops, best of 3: 52.7 µs per loop </code></pre>

Numpy sum elements in array based on its value

Tags:

performance

python

arrays

numpy

I have unsorted array of indexes:

Click to copy

i = np.array([1,5,2,6,4,3,6,7,4,3,2])

I also have an array of values of the same length:

Click to copy

v = np.array([2,5,2,3,4,1,2,1,6,4,2])

I have array with zeros of desired values:

Click to copy

d = np.zeros(10)

Now I want to add to elements in d values of v based on it's index in i.

If I do it in plain python I would do it like this:

Click to copy

for index,value in enumerate(v):
    idx = i[index]
    d[idx] += v[index]

It is ugly and inefficient. How can I change it?

291

asked Nov 20 '15 02:11

Mihail Kondratyev

2 Answers

Click to copy

np.add.at(d, i, v)

You'd think d[i] += v would work, but if you try to do multiple additions to the same cell that way, one of them overrides the others. The ufunc.at method avoids those problems.

answered Oct 14 '22 15:10

user2357112 supports Monica

We can use np.bincount which is supposedly pretty efficient for such accumulative weighted counting, so here's one with that -

Click to copy

counts = np.bincount(i,v)
d[:counts.size] = counts

Alternatively, using minlength input argument and for a generic case when d could be any array and we want to add into it -

Click to copy

d += np.bincount(i,v,minlength=d.size).astype(d.dtype, copy=False)

Runtime tests

This section compares np.add.at based approach listed in the other post with the np.bincount based one listed earlier in this post.

Click to copy

In [61]: def bincount_based(d,i,v):
    ...:     counts = np.bincount(i,v)
    ...:     d[:counts.size] = counts
    ...: 
    ...: def add_at_based(d,i,v):
    ...:     np.add.at(d, i, v)
    ...:     

In [62]: # Inputs (random numbers)
    ...: N = 10000
    ...: i = np.random.randint(0,1000,(N))
    ...: v = np.random.randint(0,1000,(N))
    ...: 
    ...: # Setup output arrays for two approaches
    ...: M = 12000
    ...: d1 = np.zeros(M)
    ...: d2 = np.zeros(M)
    ...: 

In [63]: bincount_based(d1,i,v) # Run approaches
    ...: add_at_based(d2,i,v)
    ...: 

In [64]: np.allclose(d1,d2)  # Verify outputs
Out[64]: True

In [67]: # Setup output arrays for two approaches again for timing
    ...: M = 12000
    ...: d1 = np.zeros(M)
    ...: d2 = np.zeros(M)
    ...: 

In [68]: %timeit add_at_based(d2,i,v)
1000 loops, best of 3: 1.83 ms per loop

In [69]: %timeit bincount_based(d1,i,v)
10000 loops, best of 3: 52.7 µs per loop

answered Oct 14 '22 15:10

Divakar

Related questions
                            
                                Round off floating point values in dict
                            
                                Python 3.4 lxml.etree: Start tag expected, '<' not found, line 1, column 1
                            
                                how Python cvxopt solvers qp basically works
                            
                                Is there a python construct that is a dummy function?
                            
                                Plot semi transparent contour plot over image file using matplotlib
                            
                                Comparing first element of the consecutive lists of tuples in Python
                            
                                pandas how to convert all the string value to float
                            
                                Removing first elements of tuples in a list
                            
                                retrieve intermediate features from a pipeline in Scikit (Python)
                            
                                VisibleDeprecationWarning: boolean index did not match indexed array along dimension 1; dimension is 2 but corresponding boolean dimension is 1
                            
                                Django how to use the ``receiver`` decorator on a class instead on a function
                            
                                Seaborn PairGrid: show axes labels for each subplot
                            
                                Pyspark .toPandas() results in object column where expected numeric one
                            
                                How to create a very simple DNS server using Python?
                            
                                efficient concatenation of lists in pandas series
                            
                                pandas create one column equal to another if condition is satisfied
                            
                                Why is behavior different with respect to global variables in "import module" vs "from module import * "?
                            
                                How to resize an image in python, while retaining aspect ratio, given a target size?
                            
                                AttributeError: 'map' obejct has no attribute 'index' (python 3)
                            
                                SqlAlchemy TIMESTAMP 'on update' extra

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Numpy sum elements in array based on its value

Tags:

performance

python

arrays

numpy

Mihail Kondratyev

People also ask

2 Answers

user2357112 supports Monica

Divakar

Recent Activity

Donate For Us