Conditional filtering of ndarrays

Tags:

Suppose I have the following array of arrays:

Input = np.array([[[[17.63,  0.  , -0.71, 29.03],
         [17.63, -0.09,  0.71, 56.12],
         [ 0.17,  1.24, -2.04, 18.49],
         [ 1.41, -0.8 ,  0.51, 11.85],
         [ 0.61, -0.29,  0.15, 36.75]]],


       [[[ 0.32, -0.14,  0.39, 24.52],
         [ 0.18,  0.25, -0.38, 18.08],
         [ 0.  ,  0.  ,  0.  ,  0.  ],
         [ 0.  ,  0.  ,  0.  ,  0.  ],
         [ 0.43,  0.  ,  0.3 ,  0.  ]]],


       [[[ 0.75, -0.38,  0.65, 19.51],
         [ 0.37,  0.27,  0.52, 24.27],
         [ 0.  ,  0.  ,  0.  ,  0.  ],
         [ 0.  ,  0.  ,  0.  ,  0.  ],
         [ 0.  ,  0.  ,  0.  ,  0.  ]]]])

Input.shape
(3, 1, 5, 4)

Together with this Input array is the corresponding Label array for all input, so that:

Label = np.array([0, 1, 2])

Label.shape
(3,)

I need some way to check with all nested arrays of Input, to select ONLY the array with sufficient data points.

By this I mean I want a way to eliminate (or should I say delete) all arrays whose entries of the last 3 rows are all zeros. While doing this also, eliminate the corresponding Label for that array.

Expected output:

Input_filtered
array([[[[17.63,  0.  , -0.71, 29.03],
         [17.63, -0.09,  0.71, 56.12],
         [ 0.17,  1.24, -2.04, 18.49],
         [ 1.41, -0.8 ,  0.51, 11.85],
         [ 0.61, -0.29,  0.15, 36.75]]],


       [[[ 0.32, -0.14,  0.39, 24.52],
         [ 0.18,  0.25, -0.38, 18.08],
         [ 0.  ,  0.  ,  0.  ,  0.  ],
         [ 0.  ,  0.  ,  0.  ,  0.  ],
         [ 0.43,  0.  ,  0.3 ,  0.  ]]]])

Label_filtered
array([0, 1])

What's the trick that I need?

983

asked Aug 13 '20 20:08

arilwan

2 Answers

You should be able to do this with vectorized numpy commands only.

filter_ = np.any(Input[:, :, -3:], axis=(1, 2, 3))
labels_filtered = Label[filter_]
inputs_filtered = Input[[filter_]]

For the example set you provided this yields 4.95 µs ± 9.69 ns per loop (100000 loops each) compared to the solution of anon01 with 17.1 µs ± 111 ns per loop (100000 loops each). The Improvment should me even more noteable on larger arrays.

If your data has a different dimension you can change the axis argument. For an arbitrary number of axis it could look like the following:

filter_ = np.any(Input[:, :, -3:], axis=tuple(range(1, Input.ndim)))

answered Oct 18 '22 02:10

PythonF

The best way to do this depends on the scale of your data. If there are few sub-arrays (thousands or less) you can generate a filter list that is applied to the Label and Input arrays:

filter = []
for j in range(len(Input)):
    arr = Input[j,:,-3:]
    filter.append(np.any(arr))
Label_filtered = Label[filter]
Input_filtered = Input[[filter]]

A few things to note: the vectorized/numpy bits (Input[j,:,-3], np.any(arr)) are very fast, while the native python iteration and list usage (for j in range, filter.append) are very slow.

answered Oct 18 '22 01:10

anon01

Related questions
                            
                                Python datetime.now() as a default function parameter return same value in different time
                            
                                Build a hierarchy from a relational data-set using Pyspark
                            
                                Breaking change for google-api-python-client 1.9.3 - AttributeError: module 'google.api_core' has no attribute 'gapic_v1'
                            
                                python timer decorator with outputting classname
                            
                                Python submodule importing correctly in python 3.7 but not 3.6
                            
                                Process finished with exit code -1073741571 (0xC00000FD) in Python
                            
                                How to translate an image without using cv2.warpAffine()? [Python 3 and OpenCV 4.1]
                            
                                How to import and use user defined enum classes in robot framework using python
                            
                                Difference between Python console and Terminal in PyCharm
                            
                                __dict__ Attribute of Ultimate Base Class, object in Python
                            
                                How to show progress bar when we are downloading a file from cloud bucket using python
                            
                                Convert csv into tsv using pandas with escapechar
                            
                                SpyderKernelApp WARNING No such comm
                            
                                Does Ansible expose its auto-discovered Python interpreter?
                            
                                Can you run Google Colab on your local computer?
                            
                                Graphing points on a map but the error code is "ValueError: 'box_aspect' and 'fig_aspect' must be positive"
                            
                                How can I extract text fragments from PDF with their coordinates in Python?
                            
                                "WHY" 2 different executables of python of same version?
                            
                                Verify hostname of the server who invoked the API
                            
                                How determine if a token is part of an entity within Spacy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Conditional filtering of ndarrays

Tags:

python

arrays

python-3.x

numpy

numpy-ndarray

arilwan

People also ask

2 Answers

PythonF

anon01

Recent Activity

Donate For Us