Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy: proper way of getting maximum from a list of points

Tags:

python

max

numpy

I have a list of points in 3d coordinates system (X, Y, Z). Moreover, each of them have assigned a float value v, so a single point might be described as (x, y, z, v). This list is represented as a numpy array of shape=(N,4). For each 2d position x, y I need to get the maximum value of v. A straightforward but computationally expensive way would be:

for index in range(points.shape[0]):
    x = points[index, 0]
    y = points[index, 1]
    v = points[index, 3]

    maxes[x, y] = np.max(maxes[x, y], v)

Is there a more "numpy" approach, which would be able to bring some gain in terms of performance?

like image 728
thatsme Avatar asked Sep 19 '18 14:09

thatsme


People also ask

How do you find the maximum value in NumPy?

nanmax() to find the maximum values while ignoring nan values, as well as np. argmax() or . argmax() to find the indices of the maximum values. You won't be surprised to learn that NumPy has an equivalent set of minimum functions: np.

How do you get indices of N maximum values in a NumPy array?

In order to get the indices of N maximum values in a NumPy array, we can use the argsort() function.

Which of the following function helps find the maximum value number in the NumPy?

maximum() function is used to find the element-wise maximum of array elements. It compares two arrays and returns a new array containing the element-wise maxima.

How to get the maximum value of all the elements in NumPy?

To get the maximum value of all the elements in a NumPy array, use the numpy.max () function. Pass the array to evaluate as the first argument of the function and store the result in a variable. Here is the syntax of the .max () function from the NumPy Python package:

What happens if we use 1 instead of 0 in NumPy?

If we use 1 instead of 0, will get a list like [11 16 81], which contain the maximum number from each row. Example 4: If we have two same shaped NumPy arrays, we can find the maximum or minimum elements.

How to find Max and Min in a list in Python?

1 Allow user to enter the length of the list. 2 Next, iterate the for loop and add the number in the list. 3 Iterate for loop with list and use if statement to find the min and max number and their position in the list. 4 Print the results.

How to get the largest number in a Python list?

Below is an example of getting the largest number in a list: To get the index of the maximum value in a list, use the Python arr.index () function in conjunction with the max () function like this: As you are probably aware list indexes start at 0 which is why 2 is the result in the above example.


2 Answers

Setup

points = np.array([[ 0,  0,  1,  1],
                   [ 0,  0,  2,  2],
                   [ 1,  0,  3,  0],
                   [ 1,  0,  4,  1],
                   [ 0,  1,  5, 10]])

The general idea here is sorting using the first, second, and fourth columns, and reversing that result, so that when we find our unique values, the value with the maximum value in the fourth column will be above other values with similar x and y coordinates. Then we use np.unique to find the unique values in the first and second columns, and return those results, which will have the maximum v:

Using lexsort and numpy.unique

def max_xy(a):
    res = a[np.lexsort([a[:, 3], a[:, 1], a[:, 0]])[::-1]]
    vals, idx = np.unique(res[:, :2], 1, axis=0)
    maximums = res[idx]
    return maximums[:, [0,1,3]]

array([[ 0,  0,  2],
       [ 0,  1, 10],
       [ 1,  0,  1]])

Avoiding unique for better performance

def max_xy_v2(a):
    res = a[np.lexsort([a[:, 3], a[:, 1], a[:, 0]])[::-1]]
    res = res[np.append([True], np.any(np.diff(res[:, :2],axis=0),1))]
    return res[:, [0,1,3]]

max_xy_v2(points)

array([[ 1,  0,  1],
       [ 0,  1, 10],
       [ 0,  0,  2]])

Notice that while both will return correct results, they will not be sorted as the original lists were, you can simply add another lexsort at the end to fix this if you like.

like image 185
user3483203 Avatar answered Oct 14 '22 00:10

user3483203


Sorry, not a purely "numpy" solution either, but numpy_indexed package provides a very convenient (and fast) way to do this.

import numpy_indexed as npi
npi.group_by(points[:, 0:2]).max(points[:,3])

Comparison to Other Methods

%timeit npi.group_by(points[:, 0:2]).max(points[:,3])
58 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


%timeit pd.DataFrame(points, columns=['X', 'Y', 'Z', 'V']).groupby(by=['X', 'Y']).apply(lambda r: r['V'].max()).reset_index().values
3.15 ms ± 36.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

def max_xy_ver1(a):
    res = a[np.lexsort([a[:, 0], a[:, 1], a[:, 3]])[::-1]]
    vals, idx = np.unique(res[:, :2], 1, axis=0)
    maximums = res[idx]
    return maximums[:, [0,1,3]]

%timeit max_xy_ver1(points)
63.5 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

def max_xy_ver2(a):
    res = a[np.lexsort([a[:, 3], a[:, 1], a[:, 0]])[::-1]]
    res = res[np.append([True], np.any(np.diff(res[:, :2],axis=0),1))]
    return res[:, [0,1,3]]

%timeit_max_xy_ver2(points) # current winner
31.7 µs ± 524 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

def findmaxes_simple(points):
    maxes = {}
    for index in range(points.shape[0]):
        x = points[index, 0]
        y = points[index, 1]
        v = points[index, 3]
        maxes[(x, y)] = v if (x,y) not in maxes else max(maxes[(x, y)],v)
    return maxes

%timeit findmaxes_simple(points)
82.6 µs ± 632 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Installing numpy_indexed via Pip

pip install --user numpy_indexed

(If you are on Ubuntu and some other Linux distros, you may have to use pip3 to install the package for python 3)

Data Used for Tests

Pastebin here.

like image 32
Greg Kramida Avatar answered Oct 14 '22 01:10

Greg Kramida