numpy, get maximum of subsets

Tags:

I have an array of values, said v, (e.g. v=[1,2,3,4,5,6,7,8,9,10]) and an array of indexes, say g (e.g. g=[0,0,0,0,1,1,1,1,2,2]).

I know, for instance, how to take the first element of each group, in a very numpythonic way, doing:

import numpy as np
v=np.array([1,2,3,4,74,73,72,71,9,10])
g=np.array([0,0,0,0,1,1,1,1,2,2])
mask=np.concatenate(([True],np.diff(g)!=0))
v[mask]

returns:

array([1, 74, 9])

Is there any numpythonic way (avoiding explicit loops) to get the maximum of each subset?

Tests:

Since I received two good answers, one with the python map and one with a numpy routine, and I was searching the most performing, here some timing tests:

import numpy as np
import time
N=10000000
v=np.arange(N)
Nelemes_per_group=10
Ngroups=N/Nelemes_per_group
s=np.arange(Ngroups)
g=np.repeat(s,Nelemes_per_group)

start1=time.time()
r=np.maximum.reduceat(v, np.unique(g, return_index=True)[1])
end1=time.time()
print('END first method, T=',(end1-start1),'s')

start3=time.time()
np.array(list(map(np.max,np.split(v,np.where(np.diff(g)!=0)[0]+1))))
end3=time.time()
print('END second method,  (map returns an iterable) T=',(end3-start3),'s')

As a result I get:

END first method, T= 1.6057236194610596 s
END second method,  (map returns an iterable) T= 8.346540689468384 s

Interestingly, most of the slowdown of the map method is due to the list() call. If I do not try to reconvert my map result to a list ( but I have to, because python3.x returns an iterator: https://docs.python.org/3/library/functions.html#map )

619

asked Dec 10 '15 17:12

Antonio Ragagnin

1 Answers

You can use np.maximum.reduceat:

>>> _, idx = np.unique(g, return_index=True)
>>> np.maximum.reduceat(v, idx)
array([ 4, 74, 10])

More about the workings of the ufunc reduceat method can be found here.

Remark about performance

np.maximum.reduceat is very fast. Generating the indices idx is what takes most of the time here.

While _, idx = np.unique(g, return_index=True) is an elegant way to get the indices, it is not particularly quick.

The reason is that np.unique needs to sort the array first, which is O(n log n) in complexity. For large arrays, this is much more expensive than using several O(n) operations to generate idx.

Therefore, for large arrays it is much faster to use the following instead:

idx = np.concatenate([[0], 1+np.diff(g).nonzero()[0]])
np.maximum.reduceat(v, idx)

183

answered Oct 03 '22 21:10

Alex Riley

Related questions
                            
                                efficient way to get words before and after substring in text (python)
                            
                                pandas: for each row in df copy row N times with slight changes
                            
                                HTTPSHandler error while installing pip with python 2.7.9
                            
                                Comparison operators and 'is' - operator precedence in python?
                            
                                How to use Pearson Correlation as distance metric in Scikit-learn Agglomerative clustering
                            
                                Certain Power of Sum of Digits of N == N (running too slowly)
                            
                                Displaying image without waitKey
                            
                                Python Windows 7 - Installation Fail 0x80240017
                            
                                Round floats down in Python to keep one non-zero decimal only
                            
                                Debug cython code (.pyx) when using the python debugger (pdb) - Best Practice
                            
                                How to get easy_install to ignore certifcate
                            
                                Find/extract a sequence of integers within a list in python
                            
                                Python Exception Safe Pickle Use
                            
                                Generator of evenly spaced points in a circle in python
                            
                                How to download a file over HTTP with multi-thread (asynchronous download) using Python 2.7
                            
                                Django DB Models F Combined Expression
                            
                                Using argparse with function that takes **kwargs argument
                            
                                remove item from list according to item's special attribute [duplicate]
                            
                                asyncio's call_later raises 'generator' object is not callable with coroutine object
                            
                                Making np.loadtxt work with multiple possible delimiters

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

numpy, get maximum of subsets

Tags:

python

arrays

vectorization

max

numpy

Tests:

Antonio Ragagnin

People also ask

1 Answers

Alex Riley

Recent Activity

Donate For Us