I have an array of values, said v
, (e.g. v=[1,2,3,4,5,6,7,8,9,10]
) and an array of indexes, say g
(e.g. g=[0,0,0,0,1,1,1,1,2,2]
).
I know, for instance, how to take the first element of each group, in a very numpythonic way, doing:
import numpy as np
v=np.array([1,2,3,4,74,73,72,71,9,10])
g=np.array([0,0,0,0,1,1,1,1,2,2])
mask=np.concatenate(([True],np.diff(g)!=0))
v[mask]
returns:
array([1, 74, 9])
Is there any numpy
thonic way (avoiding explicit loops) to get the maximum of each subset?
Since I received two good answers, one with the python map
and one with a numpy
routine, and I was searching the most performing, here some timing tests:
import numpy as np
import time
N=10000000
v=np.arange(N)
Nelemes_per_group=10
Ngroups=N/Nelemes_per_group
s=np.arange(Ngroups)
g=np.repeat(s,Nelemes_per_group)
start1=time.time()
r=np.maximum.reduceat(v, np.unique(g, return_index=True)[1])
end1=time.time()
print('END first method, T=',(end1-start1),'s')
start3=time.time()
np.array(list(map(np.max,np.split(v,np.where(np.diff(g)!=0)[0]+1))))
end3=time.time()
print('END second method, (map returns an iterable) T=',(end3-start3),'s')
As a result I get:
END first method, T= 1.6057236194610596 s
END second method, (map returns an iterable) T= 8.346540689468384 s
Interestingly, most of the slowdown of the map
method is due to the list()
call. If I do not try to reconvert my map
result to a list
( but I have to, because python3.x
returns an iterator: https://docs.python.org/3/library/functions.html#map )
You can use this built-in max() to find the maximum element in a one-dimensional NumPy array, but it has no support for arrays with more dimensions.
numpy. amax() returns the maximum of an array. The return type is ndarray or scalar, depending on the input. If an axis is specified, the output is an array of dimension a. ndim - 1.
max() and numpy. min() functions we can find the maximum and minimum element. Here, we get the maximum and minimum value from the whole array.
You can use np.maximum.reduceat
:
>>> _, idx = np.unique(g, return_index=True)
>>> np.maximum.reduceat(v, idx)
array([ 4, 74, 10])
More about the workings of the ufunc reduceat
method can be found here.
Remark about performance
np.maximum.reduceat
is very fast. Generating the indices idx
is what takes most of the time here.
While _, idx = np.unique(g, return_index=True)
is an elegant way to get the indices, it is not particularly quick.
The reason is that np.unique
needs to sort the array first, which is O(n log n) in complexity. For large arrays, this is much more expensive than using several O(n) operations to generate idx
.
Therefore, for large arrays it is much faster to use the following instead:
idx = np.concatenate([[0], 1+np.diff(g).nonzero()[0]])
np.maximum.reduceat(v, idx)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With