I'm trying to add float values of a vector according to integer values from another vector. for instance if I have: <pre class="prettyprint"><code>import numpy as np a = np.array([0.1,0.2,0.3,0.4,0.5,0.6,07.3,0.8,0.9,1.,1.2,1.4]) b = np.array([0,0,0,0,0,1,1,1,2,2,2,2]).astype(int) </code></pre> I would like to add the 5 first value of the a vector together (because the 5 first values of b are 0), the 3 next values together (because the 3 next values of b are 1) and so on. So At the end I woudl expect to have <pre class="prettyprint"><code>c = function(a,b) c = [0.1+0.2+0.3+0.4+0.5, 0.6+7.3+0.8, 0.9+1.+1.2+1.4] </code></pre>

Approach #1 : We can make use of <code>np.bincount</code> with <code>b</code> as the bins and <code>a</code> as weights array - <pre class="prettyprint"><code>In [203]: np.bincount(b,a) Out[203]: array([1.5, 8.7, 4.5]) </code></pre> Approach #2 : Another leveraging <code>matrix-multiplication</code> - <pre class="prettyprint"><code>In [210]: (b == np.arange(b.max()+1)[:,None]).dot(a) Out[210]: array([1.5, 8.7, 4.5]) </code></pre>

Is it possible to combine (add) values of a vector according to integer value of another vector

Tags:

python

arrays

numpy

grouping

I'm trying to add float values of a vector according to integer values from another vector.

for instance if I have:

import numpy as np
a = np.array([0.1,0.2,0.3,0.4,0.5,0.6,07.3,0.8,0.9,1.,1.2,1.4])
b = np.array([0,0,0,0,0,1,1,1,2,2,2,2]).astype(int)

I would like to add the 5 first value of the a vector together (because the 5 first values of b are 0), the 3 next values together (because the 3 next values of b are 1) and so on. So At the end I woudl expect to have

c = function(a,b)
c = [0.1+0.2+0.3+0.4+0.5,  0.6+7.3+0.8, 0.9+1.+1.2+1.4]

911

asked Oct 03 '18 09:10

ymmx

2 Answers

Approach #1 : We can make use of np.bincount with b as the bins and a as weights array -

In [203]: np.bincount(b,a)
Out[203]: array([1.5, 8.7, 4.5])

Approach #2 : Another leveraging matrix-multiplication -

In [210]: (b == np.arange(b.max()+1)[:,None]).dot(a)
Out[210]: array([1.5, 8.7, 4.5])

answered Nov 29 '22 05:11

Divakar

For a pure numpy solution, you can check the np.diff() of b, which will give you a new array of zeros everywhere except wherever the values change. However, this needs one small tweak as np.diff() reduces the size of your array by one element, so your indices will be off by one. There actually is current development in numpy to make this better (giving new arguments to pad the output back to original size; see the issue here: https://github.com/numpy/numpy/issues/8132)

With that said...here's something that should be instructive:

In [100]: a
Out[100]: array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 7.3, 0.8, 0.9, 1. , 1.2, 1.4])

In [101]: b
Out[101]: array([0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2])

In [102]: np.diff(b) # note it is one element shorter than b
Out[102]: array([0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0])

In [103]: np.flatnonzero(np.diff(b))
Out[103]: array([4, 7]) 

In [104]: np.flatnonzero(np.diff(b)) + 1
Out[104]: array([5, 8])

In [105]: np.insert(np.flatnonzero(np.diff(b)) + 1, 0, 0)
Out[105]: array([0, 5, 8]) # these are the indices of the start of each group

In [106]: indices = _

In [107]: np.add.reduceat(a, indices)
Out[107]: array([1.5, 8.7, 4.5])

In [108]: def sumatchanges(a, b):
     ...:     indices = np.insert(np.flatnonzero(np.diff(b)) + 1, 0, 0)
     ...:     return np.add.reduceat(a, indices)
     ...:

In [109]: sumatchanges(a, b)
Out[109]: array([1.5, 8.7, 4.5])

I would definitely prefer using Pandas groupby as jpp's answer used in most settings, as this is ugly. Hopefully with those changes to numpy, this could be a bit nicer looking and more natural in the future.

Note that this answer is equivalent to the itertools.groupby answer that Maarten gave (in output). Specifically, that is that the groups are assumed to be sequential. I.e., this

b = np.array([0,0,0,0,0,1,1,1,2,2,2,2]).astype(int)

would produce the same output as with

b = np.array([0,0,0,0,0,1,1,1,0,0,0,0]).astype(int)

The number is irrelevant, so long as it changes. However for the other solution Maarten gave, and the pandas solution by jpp, those will sum all the things with the same label, regardless of location. OP is not clear on which you prefer.

Timing:

Here I'll create a random array for summing and a random array of increasing values with 100k entries each, and test both functions time:

In [115]: import timeit
In [116]: import pandas as pd

In [117]: def sumatchangespd(a, b):
     ...:     return pd.Series(a).groupby(b).sum().values
     ...:

In [125]: l = 100_000

In [126]: a = np.random.rand(l)

In [127]: b = np.cumsum(np.random.randint(2, size=l))

In [128]: sumatchanges(a, b)
Out[128]:
array([2.83528234e-01, 6.66182064e-01, 9.32624292e-01, ...,
       2.98379765e-01, 1.97586484e+00, 8.65103445e-04])

In [129]: %timeit sumatchanges(a, b)
1.91 ms ± 47.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [130]: %timeit sumatchangespd(a, b)
6.33 ms ± 267 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Also just to make sure these are equivalent:

In [139]: all(np.isclose(sumatchanges(a, b), sumatchangespd(a, b)))
Out[139]: True

So the numpy version is faster (not too surprising). Again, these functions could do slightly different things though, depending on your input:

In [120]: b  # numpy solution grabs each chunk as a separate piece
Out[120]: array([0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2])

In [121]: b[-4:] = 0

In [122]: b   # pandas will sum the vals in a that have same vals in b
Out[122]: array([0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0])

In [123]: sumatchanges(a, b)
Out[123]: array([1.5, 8.7, 4.5])

In [124]: sumatchangespd(a, b)
Out[124]: array([6. , 8.7])

Divakar's main solution is brilliant and the best out of all of the above speed-wise:

In [144]: def sumatchangesbc(a, b):
     ...:     return np.bincount(b,a)
     ...:

In [145]: %timeit sumatchangesbc(a, b)
175 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Order of magnitude faster than my numpy solution.

answered Nov 29 '22 05:11

alkasm

Related questions
                            
                                ZeroMQ operation throws EXC: [ Operation cannot be accomplished in current state ]
                            
                                pandas dataframe delete rows with low frequency
                            
                                Replace 1's with 0's in a sequence
                            
                                'JpegImageFile' object has no attribute '_committed' error when using PIL
                            
                                why is 'ord' seen as an unassigned variable here?
                            
                                Shrink polygon using corner coordinates
                            
                                AttributeError: 'YouTube' object has no attribute 'get_videos'
                            
                                How to vertically stack trained models in keras?
                            
                                Pandas read data without header or index
                            
                                Python: determine if three text strings stored in a dataframe have any words in common
                            
                                How can I zoom my webcam in Open CV Python?
                            
                                Swapping elements between A and B to get sums equality
                            
                                Plotting Multiple Routes with OSMNx
                            
                                Failed building wheel for Twisted in Windows 10 python 3
                            
                                Why do some images have third dimension 3 while others have 4?
                            
                                Webscraping Instagram follower count BeautifulSoup
                            
                                Remove all rows that meet regex condition
                            
                                Is type(1) the equivalent of type.__call__(1)?
                            
                                Python NetworkX - Why are graphs always randomly rotated?
                            
                                PyTorch softmax with dim

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With