What  would be an efficient (time, easy) way of grouping a 2D NumPy matrix rows by different column conditions (e.g. group by column 2 values) and running f1() and f2() on each of those groups?
Thanks
If you have an array arr of shape (rows, cols), you can get the vector of all values in column 2 as
col = arr[:, 2]
You can then construct a boolean array with your grouping condition, say group 1 is made up of those rows with have a value larger than 5 in column 2:
idx = col > 5
You can apply this boolean array directly to your original array to select rows:
group_1 = arr[idx]
group_2 = arr[~idx]
For example:
>>> arr = np.random.randint(10, size=(6,4))
>>> arr
array([[0, 8, 7, 4],
       [5, 2, 6, 9],
       [9, 5, 7, 5],
       [6, 9, 1, 5],
       [8, 0, 5, 8],
       [8, 2, 0, 6]])
>>> idx = arr[:, 2] > 5
>>> arr[idx]
array([[0, 8, 7, 4],
       [5, 2, 6, 9],
       [9, 5, 7, 5]])
>>> arr[~idx]
array([[6, 9, 1, 5],
       [8, 0, 5, 8],
       [8, 2, 0, 6]])
                        A compact solution is to use numpy_indexed (disclaimer: I am its author), which implements a fully vectorized solution to this type of problem:
The simplest way to use it is as:
import numpy_indexed as npi
npi.group_by(arr[:, col1]).mean(arr)
But this also works:
# run function f1 on each group, formed by keys which are the rows of arr[:, [col1, col2]
npi.group_by(arr[:, [col1, col2]], arr, f1)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With