Filter a numpy array based on largest value

Question

I have a numpy array which holds 4-dimensional vectors which have the following format (x, y, z, w)

The size of the array is 4 x N. Now, the data I have is where I have (x, y, z) spatial locations and w holds some particular measurement at this location. Now, there could be multiple measurements associated with an (x, y, z) position (measured as floats).

What I would like to do is filter the array, so that I get a new array where I get the maximum measurement corresponding with each (x, y, z) position.

So if my data is like:

x, y, z, w1
x, y, z, w2
x, y, z, w3

where w1 is greater than w2 and w3, the filtered data would be:

x, y, z, w1

So more concretely, say I have data like:

[[ 0.7732126   0.48649481  0.29771819  0.91622924]
 [ 0.7732126   0.48649481  0.29771819  1.91622924]
 [ 0.58294263  0.32025559  0.6925856   0.0524125 ]
 [ 0.58294263  0.32025559  0.6925856   0.05 ]
 [ 0.58294263  0.32025559  0.6925856   1.7 ]
 [ 0.3239913   0.7786444   0.41692853  0.10467392]
 [ 0.12080023  0.74853649  0.15356663  0.4505753 ]
 [ 0.13536096  0.60319054  0.82018125  0.10445047]
 [ 0.1877724   0.96060999  0.39697999  0.59078612]]

This should return

[[ 0.7732126   0.48649481  0.29771819  1.91622924]
 [ 0.58294263  0.32025559  0.6925856   1.7 ]
 [ 0.3239913   0.7786444   0.41692853  0.10467392]
 [ 0.12080023  0.74853649  0.15356663  0.4505753 ]
 [ 0.13536096  0.60319054  0.82018125  0.10445047]
 [ 0.1877724   0.96060999  0.39697999  0.59078612]]

Jaime · Accepted Answer

This is convoluted, but it is probably as good as you are going to get using numpy only...

First, we use lexsort to put all entries with the same coordinates together. With a being your sample array:

>>> perm = np.lexsort(a[:, 3::-1].T)
>>> a[perm]
array([[ 0.12080023,  0.74853649,  0.15356663,  0.4505753 ],
       [ 0.7732126 ,  0.48649481,  0.29771819,  0.91622924],
       [ 0.7732126 ,  0.48649481,  0.29771819,  1.91622924],
       [ 0.1877724 ,  0.96060999,  0.39697999,  0.59078612],
       [ 0.3239913 ,  0.7786444 ,  0.41692853,  0.10467392],
       [ 0.58294263,  0.32025559,  0.6925856 ,  0.0524125 ],
       [ 0.58294263,  0.32025559,  0.6925856 ,  0.05      ],
       [ 0.58294263,  0.32025559,  0.6925856 ,  1.7       ],
       [ 0.13536096,  0.60319054,  0.82018125,  0.10445047]])

Note that by reversing the axis, we are sorting by x, breaking ties with y, then z, then w.

Because it is the maximum we are looking for, we just need to take the last entry in every group, which is a pretty straightforward thing to do:

>>> a_sorted = a[perm]
>>> last = np.concatenate((np.all(a_sorted[:-1, :3] != a_sorted[1:, :3], axis=1),
                           [True]))
>>> a_unique_max = a_sorted[last]
>>> a_unique_max
array([[ 0.12080023,  0.74853649,  0.15356663,  0.4505753 ],
       [ 0.13536096,  0.60319054,  0.82018125,  0.10445047],
       [ 0.1877724 ,  0.96060999,  0.39697999,  0.59078612],
       [ 0.3239913 ,  0.7786444 ,  0.41692853,  0.10467392],
       [ 0.58294263,  0.32025559,  0.6925856 ,  1.7       ],
       [ 0.7732126 ,  0.48649481,  0.29771819,  1.91622924]])

If you would rather not have the output sorted, but keep them in the original order they came up in the original array, you can also get that with the aid of perm:

>>> a_unique_max[np.argsort(perm[last])]
array([[ 0.7732126 ,  0.48649481,  0.29771819,  1.91622924],
       [ 0.58294263,  0.32025559,  0.6925856 ,  1.7       ],
       [ 0.3239913 ,  0.7786444 ,  0.41692853,  0.10467392],
       [ 0.12080023,  0.74853649,  0.15356663,  0.4505753 ],
       [ 0.13536096,  0.60319054,  0.82018125,  0.10445047],
       [ 0.1877724 ,  0.96060999,  0.39697999,  0.59078612]])

This will only work for the maximum, and it comes as a by-product of the sorting. If you are after a different function, say the product of all same-coordinates entries, you could do something like:

>>> first = np.concatenate(([True],
                            np.all(a_sorted[:-1, :3] != a_sorted[1:, :3], axis=1)))
>>> a_unique_prods = np.multiply.reduceat(a_sorted, np.nonzero(first)[0])

And you will have to play a little around with these results to assemble your return array.

Randy · Answer

I see that you already got the pointer towards pandas in the comments. FWIW, here's how you can get the desired behavior, assuming you don't care about the final sort order since groupby changes it up.

In [14]: arr
Out[14]:
array([[ 0.7732126 ,  0.48649481,  0.29771819,  0.91622924],
       [ 0.7732126 ,  0.48649481,  0.29771819,  1.91622924],
       [ 0.58294263,  0.32025559,  0.6925856 ,  0.0524125 ],
       [ 0.58294263,  0.32025559,  0.6925856 ,  0.05      ],
       [ 0.58294263,  0.32025559,  0.6925856 ,  1.7       ],
       [ 0.3239913 ,  0.7786444 ,  0.41692853,  0.10467392],
       [ 0.12080023,  0.74853649,  0.15356663,  0.4505753 ],
       [ 0.13536096,  0.60319054,  0.82018125,  0.10445047],
       [ 0.1877724 ,  0.96060999,  0.39697999,  0.59078612]])

In [15]: import pandas as pd

In [16]: pd.DataFrame(arr)
Out[16]:
          0         1         2         3
0  0.773213  0.486495  0.297718  0.916229
1  0.773213  0.486495  0.297718  1.916229
2  0.582943  0.320256  0.692586  0.052413
3  0.582943  0.320256  0.692586  0.050000
4  0.582943  0.320256  0.692586  1.700000
5  0.323991  0.778644  0.416929  0.104674
6  0.120800  0.748536  0.153567  0.450575
7  0.135361  0.603191  0.820181  0.104450
8  0.187772  0.960610  0.396980  0.590786

In [17]: pd.DataFrame(arr).groupby([0,1,2]).max().reset_index()
Out[17]:
          0         1         2         3
0  0.120800  0.748536  0.153567  0.450575
1  0.135361  0.603191  0.820181  0.104450
2  0.187772  0.960610  0.396980  0.590786
3  0.323991  0.778644  0.416929  0.104674
4  0.582943  0.320256  0.692586  1.700000
5  0.773213  0.486495  0.297718  1.916229

Filter a numpy array based on largest value

Tags:

python

arrays

numpy

Luca

2 Answers

Jaime

Randy

Recent Activity

Donate For Us

Filter a numpy array based on largest value

Tags:

python

arrays

numpy

Luca

2 Answers

Jaime

Randy

Related questions

Recent Activity

Donate For Us