Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional filtering of ndarrays

Suppose I have the following array of arrays:

Input = np.array([[[[17.63,  0.  , -0.71, 29.03],
         [17.63, -0.09,  0.71, 56.12],
         [ 0.17,  1.24, -2.04, 18.49],
         [ 1.41, -0.8 ,  0.51, 11.85],
         [ 0.61, -0.29,  0.15, 36.75]]],


       [[[ 0.32, -0.14,  0.39, 24.52],
         [ 0.18,  0.25, -0.38, 18.08],
         [ 0.  ,  0.  ,  0.  ,  0.  ],
         [ 0.  ,  0.  ,  0.  ,  0.  ],
         [ 0.43,  0.  ,  0.3 ,  0.  ]]],


       [[[ 0.75, -0.38,  0.65, 19.51],
         [ 0.37,  0.27,  0.52, 24.27],
         [ 0.  ,  0.  ,  0.  ,  0.  ],
         [ 0.  ,  0.  ,  0.  ,  0.  ],
         [ 0.  ,  0.  ,  0.  ,  0.  ]]]])

Input.shape
(3, 1, 5, 4)

Together with this Input array is the corresponding Label array for all input, so that:

Label = np.array([0, 1, 2])

Label.shape
(3,)

I need some way to check with all nested arrays of Input, to select ONLY the array with sufficient data points.

By this I mean I want a way to eliminate (or should I say delete) all arrays whose entries of the last 3 rows are all zeros. While doing this also, eliminate the corresponding Label for that array.

Expected output:

Input_filtered
array([[[[17.63,  0.  , -0.71, 29.03],
         [17.63, -0.09,  0.71, 56.12],
         [ 0.17,  1.24, -2.04, 18.49],
         [ 1.41, -0.8 ,  0.51, 11.85],
         [ 0.61, -0.29,  0.15, 36.75]]],


       [[[ 0.32, -0.14,  0.39, 24.52],
         [ 0.18,  0.25, -0.38, 18.08],
         [ 0.  ,  0.  ,  0.  ,  0.  ],
         [ 0.  ,  0.  ,  0.  ,  0.  ],
         [ 0.43,  0.  ,  0.3 ,  0.  ]]]])

Label_filtered
array([0, 1])

What's the trick that I need?

like image 983
arilwan Avatar asked Aug 13 '20 20:08

arilwan


People also ask

How to filter array in NumPy?

In NumPy, you filter an array using a boolean index list. A boolean index list is a list of booleans corresponding to indexes in the array. If the value at an index is True that element is contained in the filtered array, if the value at that index is False that element is excluded from the filtered array.

How to filter ndarray with filter in NumPy?

If arr is a subclass of ndarray, a base class ndarray is returned. Here, we first create a numpy array and a filter with its values to be filtered. To filter we used this fltr in numpy.in1d () method and stored as its values in the original array that return True if condition fulfills.

How to use conditional filters in tableau?

Following are the list of options available in this Tableau conditional filters section: First Dropdown List: By default, this will select the Measure value present in the record. But you can change the filed by clicking the down arrow. When you click that down arrow, it displays all the Measures, and Dimensions present in the data source.

Why do we need to filter pandas Dataframe with multiple conditions?

The reason is dataframe may be having multiple columns and multiple rows. Selective display of columns with limited rows is always the expected view of users. To fulfill the user’s expectations and also help in machine deep learning scenarios, filtering of Pandas dataframe with multiple conditions is much necessary.

How to know if an element is excluded from a filtered array?

If the value at an index is True that element is contained in the filtered array, if the value at that index is False that element is excluded from the filtered array. The example above will return [41, 43], why?


2 Answers

You should be able to do this with vectorized numpy commands only.

filter_ = np.any(Input[:, :, -3:], axis=(1, 2, 3))
labels_filtered = Label[filter_]
inputs_filtered = Input[[filter_]]

For the example set you provided this yields 4.95 µs ± 9.69 ns per loop (100000 loops each) compared to the solution of anon01 with 17.1 µs ± 111 ns per loop (100000 loops each). The Improvment should me even more noteable on larger arrays.

If your data has a different dimension you can change the axis argument. For an arbitrary number of axis it could look like the following:

filter_ = np.any(Input[:, :, -3:], axis=tuple(range(1, Input.ndim)))
like image 55
PythonF Avatar answered Oct 18 '22 02:10

PythonF


The best way to do this depends on the scale of your data. If there are few sub-arrays (thousands or less) you can generate a filter list that is applied to the Label and Input arrays:

filter = []
for j in range(len(Input)):
    arr = Input[j,:,-3:]
    filter.append(np.any(arr))
Label_filtered = Label[filter]
Input_filtered = Input[[filter]]

A few things to note: the vectorized/numpy bits (Input[j,:,-3], np.any(arr)) are very fast, while the native python iteration and list usage (for j in range, filter.append) are very slow.

like image 33
anon01 Avatar answered Oct 18 '22 01:10

anon01