I'm looking for a fast solution to MATLAB's <code>accumarray</code> in numpy. The <code>accumarray</code> accumulates the elements of an array which belong to the same index. An example: <pre class="prettyprint"><code>a = np.arange(1,11) # array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) accmap = np.array([0,1,0,0,0,1,1,2,2,1]) </code></pre> Result should be <pre class="prettyprint"><code>array([13, 25, 17]) </code></pre> What I've done so far: I've tried the <code>accum</code> function in the recipe here which works fine but is slow. <pre class="prettyprint"><code>accmap = np.repeat(np.arange(1000), 20) a = np.random.randn(accmap.size) %timeit accum(accmap, a, np.sum) # 1 loops, best of 3: 293 ms per loop </code></pre> Then I tried to use the solution here which is supposed to work faster but it doesn't work correctly: <pre class="prettyprint"><code>accum_np(accmap, a) # array([ 1., 2., 12., 13., 17., 10.]) </code></pre> Is there a built-in numpy function that can do accumulation like this? Or any other recommendations?

Use <code>np.bincount</code> with the <code>weights</code> optional argument. In your example you would do: <pre class="prettyprint"><code>np.bincount(accmap, weights=a) </code></pre>

Is there a MATLAB accumarray equivalent in numpy?

Tags:

accumulator

I'm looking for a fast solution to MATLAB's accumarray in numpy. The accumarray accumulates the elements of an array which belong to the same index. An example:

a = np.arange(1,11)
# array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
accmap = np.array([0,1,0,0,0,1,1,2,2,1])

Result should be

array([13, 25, 17])

What I've done so far: I've tried the accum function in the recipe here which works fine but is slow.

accmap = np.repeat(np.arange(1000), 20)
a = np.random.randn(accmap.size)
%timeit accum(accmap, a, np.sum)
# 1 loops, best of 3: 293 ms per loop

Then I tried to use the solution here which is supposed to work faster but it doesn't work correctly:

accum_np(accmap, a)
# array([  1.,   2.,  12.,  13.,  17.,  10.])

Is there a built-in numpy function that can do accumulation like this? Or any other recommendations?

845

asked May 31 '13 11:05

2 Answers

Use np.bincount with the weights optional argument. In your example you would do:

np.bincount(accmap, weights=a)

answered Oct 09 '22 21:10

Jaime

Late to the party, but...

As @Jamie says, for the case of summing, np.bincount is fast and simple. However in the more general case, for other ufuncs such as maximum, you can use the np.ufunc.at method.

I've put together ~~a gist~~[see link below instead] which encapsulates this in a Matlab-like interface. It also takes advantage of the repeated indexing rules to provide a 'last' and 'first' function, and unlike Matlab, 'mean' is sensibly optimized (calling accumarray with @mean in Matlab is really slow because it runs a non-builtin function for every single group, which is stupid).

Be warned that I haven't particularly tested the gist, but will hopefully update it in future with extra features and bugfixes.

Update May/June-2015: I have reworked my implementation - it is now available as part of ml31415/numpy-groupies and available on PyPi (pip install numpy-groupies). Benchmarks are as follows (see github repo for up-to-date values)...

function  pure-py  np-grouploop   np-ufuncat np-optimised    pandas        ratio
     std  1737.8ms       171.8ms     no-impl       7.0ms    no-impl   247.1: 24.4:  -  : 1.0 :  -  
     all  1280.8ms        62.2ms      41.8ms       6.6ms    550.7ms   193.5: 9.4 : 6.3 : 1.0 : 83.2
     min  1358.7ms        59.6ms      42.6ms      42.7ms     24.5ms    55.4: 2.4 : 1.7 : 1.7 : 1.0 
     max  1538.3ms        55.9ms      38.8ms      37.5ms     18.8ms    81.9: 3.0 : 2.1 : 2.0 : 1.0 
     sum  1532.8ms        62.6ms      40.6ms       1.9ms     20.4ms   808.5: 33.0: 21.4: 1.0 : 10.7
     var  1756.8ms       146.2ms     no-impl       6.3ms    no-impl   279.1: 23.2:  -  : 1.0 :  -  
    prod  1448.8ms        55.2ms      39.9ms      38.7ms     20.2ms    71.7: 2.7 : 2.0 : 1.9 : 1.0 
     any  1399.5ms        69.1ms      41.1ms       5.7ms    558.8ms   246.2: 12.2: 7.2 : 1.0 : 98.3
    mean  1321.3ms        88.3ms     no-impl       4.0ms     20.9ms   327.6: 21.9:  -  : 1.0 : 5.2 
Python 2.7.9, Numpy 1.9.2, Win7 Core i7.

Here we are using 100,000 indices uniformly picked from [0, 1000). Specifically, about 25% of the values are 0 (for use with bool operations), the remainder are uniformly distribuited on [-50,25). Timings are shown for 10 repeats.

purepy - uses nothing but pure python, relying partly on itertools.groupby.
np-grouploop - uses numpy to sort values based on idx, then uses split to create separate arrays, and then loops over these arrays, running the relevant numpy function for each array.
np-ufuncat - uses the numpy ufunc.at method, which is slower than it ought to be - as disuccsed in an issue I created on numpy's github repo.
np-optimisied - uses custom numpy indexing/other tricks to beat the above two implementations (except for min max prod which rely on ufunc.at).
pandas - pd.DataFrame({'idx':idx, 'vals':vals}).groupby('idx').sum() etc.

Note that some of the no-impls may be unwarranted, but I haven't bothered to get them working yet.

As explained on github, accumarray now supports nan-prefixed functions (e.g. nansum) as well as, sort, rsort, and array. It also works with multidimensional indexing.

answered Oct 09 '22 21:10

dan-man

Related questions
                            
                                "error: Unable to find vcvarsall.bat" when compiling Cython code
                            
                                spaCy and spaCy models in setup.py
                            
                                Finding highest value in a dictionary
                            
                                How to convert pandas dataframe to hierarchical dictionary
                            
                                Looking for a diagram to explain WSGI [closed]
                            
                                Clean Up HTML in Python
                            
                                Passing a JSON object through POST using Python
                            
                                Graphing in Python 3.x
                            
                                List of lists and "Too many values to unpack"
                            
                                Django annotate groupings by month
                            
                                Converting Python Code to PHP [closed]
                            
                                Easy convert betwen SQLAlchemy column types and python data types?
                            
                                Why does foo.append(bar) affect all elements in a list of lists?
                            
                                Flask app that routes based on subdomain
                            
                                Efficient way to convert delimiter separated string to numpy array
                            
                                python 'x days ago' to datetime
                            
                                python time subtraction
                            
                                Set execute bit for a file using python
                            
                                Python generator objects and .join
                            
                                How to resize window in opencv2 python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a MATLAB accumarray equivalent in numpy?

Tags:

python

numpy

accumulator

petrichor

People also ask

2 Answers

Jaime

dan-man

Recent Activity

Donate For Us