I have a list of points in 3d coordinates system (X, Y, Z). Moreover, each of them have assigned a float value v, so a single point might be described as (x, y, z, v). This list is represented as a numpy array of shape=(N,4). For each 2d position x, y I need to get the maximum value of v. A straightforward but computationally expensive way would be:
for index in range(points.shape[0]):
x = points[index, 0]
y = points[index, 1]
v = points[index, 3]
maxes[x, y] = np.max(maxes[x, y], v)
Is there a more "numpy" approach, which would be able to bring some gain in terms of performance?
nanmax() to find the maximum values while ignoring nan values, as well as np. argmax() or . argmax() to find the indices of the maximum values. You won't be surprised to learn that NumPy has an equivalent set of minimum functions: np.
In order to get the indices of N maximum values in a NumPy array, we can use the argsort() function.
maximum() function is used to find the element-wise maximum of array elements. It compares two arrays and returns a new array containing the element-wise maxima.
To get the maximum value of all the elements in a NumPy array, use the numpy.max () function. Pass the array to evaluate as the first argument of the function and store the result in a variable. Here is the syntax of the .max () function from the NumPy Python package:
If we use 1 instead of 0, will get a list like [11 16 81], which contain the maximum number from each row. Example 4: If we have two same shaped NumPy arrays, we can find the maximum or minimum elements.
1 Allow user to enter the length of the list. 2 Next, iterate the for loop and add the number in the list. 3 Iterate for loop with list and use if statement to find the min and max number and their position in the list. 4 Print the results.
Below is an example of getting the largest number in a list: To get the index of the maximum value in a list, use the Python arr.index () function in conjunction with the max () function like this: As you are probably aware list indexes start at 0 which is why 2 is the result in the above example.
Setup
points = np.array([[ 0, 0, 1, 1],
[ 0, 0, 2, 2],
[ 1, 0, 3, 0],
[ 1, 0, 4, 1],
[ 0, 1, 5, 10]])
The general idea here is sorting using the first, second, and fourth columns, and reversing that result, so that when we find our unique values, the value with the maximum value in the fourth column will be above other values with similar x and y coordinates. Then we use np.unique
to find the unique values in the first and second columns, and return those results, which will have the maximum v
:
lexsort
and numpy.unique
def max_xy(a):
res = a[np.lexsort([a[:, 3], a[:, 1], a[:, 0]])[::-1]]
vals, idx = np.unique(res[:, :2], 1, axis=0)
maximums = res[idx]
return maximums[:, [0,1,3]]
array([[ 0, 0, 2],
[ 0, 1, 10],
[ 1, 0, 1]])
unique
for better performancedef max_xy_v2(a):
res = a[np.lexsort([a[:, 3], a[:, 1], a[:, 0]])[::-1]]
res = res[np.append([True], np.any(np.diff(res[:, :2],axis=0),1))]
return res[:, [0,1,3]]
max_xy_v2(points)
array([[ 1, 0, 1],
[ 0, 1, 10],
[ 0, 0, 2]])
Notice that while both will return correct results, they will not be sorted as the original lists were, you can simply add another lexsort
at the end to fix this if you like.
Sorry, not a purely "numpy" solution either, but numpy_indexed package provides a very convenient (and fast) way to do this.
import numpy_indexed as npi
npi.group_by(points[:, 0:2]).max(points[:,3])
%timeit npi.group_by(points[:, 0:2]).max(points[:,3])
58 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit pd.DataFrame(points, columns=['X', 'Y', 'Z', 'V']).groupby(by=['X', 'Y']).apply(lambda r: r['V'].max()).reset_index().values
3.15 ms ± 36.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
def max_xy_ver1(a):
res = a[np.lexsort([a[:, 0], a[:, 1], a[:, 3]])[::-1]]
vals, idx = np.unique(res[:, :2], 1, axis=0)
maximums = res[idx]
return maximums[:, [0,1,3]]
%timeit max_xy_ver1(points)
63.5 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
def max_xy_ver2(a):
res = a[np.lexsort([a[:, 3], a[:, 1], a[:, 0]])[::-1]]
res = res[np.append([True], np.any(np.diff(res[:, :2],axis=0),1))]
return res[:, [0,1,3]]
%timeit_max_xy_ver2(points) # current winner
31.7 µs ± 524 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
def findmaxes_simple(points):
maxes = {}
for index in range(points.shape[0]):
x = points[index, 0]
y = points[index, 1]
v = points[index, 3]
maxes[(x, y)] = v if (x,y) not in maxes else max(maxes[(x, y)],v)
return maxes
%timeit findmaxes_simple(points)
82.6 µs ± 632 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
pip install --user numpy_indexed
(If you are on Ubuntu and some other Linux distros, you may have to use pip3
to install the package for python 3)
Pastebin here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With