Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy, get maximum of subsets

I have an array of values, said v, (e.g. v=[1,2,3,4,5,6,7,8,9,10]) and an array of indexes, say g (e.g. g=[0,0,0,0,1,1,1,1,2,2]).

I know, for instance, how to take the first element of each group, in a very numpythonic way, doing:

import numpy as np
v=np.array([1,2,3,4,74,73,72,71,9,10])
g=np.array([0,0,0,0,1,1,1,1,2,2])
mask=np.concatenate(([True],np.diff(g)!=0))
v[mask]

returns:

array([1, 74, 9])

Is there any numpythonic way (avoiding explicit loops) to get the maximum of each subset?


Tests:

Since I received two good answers, one with the python map and one with a numpy routine, and I was searching the most performing, here some timing tests:

import numpy as np
import time
N=10000000
v=np.arange(N)
Nelemes_per_group=10
Ngroups=N/Nelemes_per_group
s=np.arange(Ngroups)
g=np.repeat(s,Nelemes_per_group)

start1=time.time()
r=np.maximum.reduceat(v, np.unique(g, return_index=True)[1])
end1=time.time()
print('END first method, T=',(end1-start1),'s')

start3=time.time()
np.array(list(map(np.max,np.split(v,np.where(np.diff(g)!=0)[0]+1))))
end3=time.time()
print('END second method,  (map returns an iterable) T=',(end3-start3),'s')

As a result I get:

END first method, T= 1.6057236194610596 s
END second method,  (map returns an iterable) T= 8.346540689468384 s

Interestingly, most of the slowdown of the map method is due to the list() call. If I do not try to reconvert my map result to a list ( but I have to, because python3.x returns an iterator: https://docs.python.org/3/library/functions.html#map )

like image 619
Antonio Ragagnin Avatar asked Dec 10 '15 17:12

Antonio Ragagnin


People also ask

How do you find the maximum of a Numpy array?

You can use this built-in max() to find the maximum element in a one-dimensional NumPy array, but it has no support for arrays with more dimensions.

What is AMAX Numpy?

numpy. amax() returns the maximum of an array. The return type is ndarray or scalar, depending on the input. If an axis is specified, the output is an array of dimension a. ndim - 1.

Which of the following function helps find the maximum value number in the Numpy?

max() and numpy. min() functions we can find the maximum and minimum element. Here, we get the maximum and minimum value from the whole array.


1 Answers

You can use np.maximum.reduceat:

>>> _, idx = np.unique(g, return_index=True)
>>> np.maximum.reduceat(v, idx)
array([ 4, 74, 10])

More about the workings of the ufunc reduceat method can be found here.


Remark about performance

np.maximum.reduceat is very fast. Generating the indices idx is what takes most of the time here.

While _, idx = np.unique(g, return_index=True) is an elegant way to get the indices, it is not particularly quick.

The reason is that np.unique needs to sort the array first, which is O(n log n) in complexity. For large arrays, this is much more expensive than using several O(n) operations to generate idx.

Therefore, for large arrays it is much faster to use the following instead:

idx = np.concatenate([[0], 1+np.diff(g).nonzero()[0]])
np.maximum.reduceat(v, idx)
like image 183
Alex Riley Avatar answered Oct 03 '22 21:10

Alex Riley