Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Masking a DataFrame using multiple criteria

I know one can mask out certain rows in a data frame using e.g.

(1) mask = df['A']=='a'

where df is the data frame at hand having a column named 'A'. Calling df[mask] yields my new "masked" DataFrame.

One can of course also use multiple criteria with

(2) mask = (df['A']=='a') | (df['A']=='b')

This last step however can get a bit tedious when there are several criteria that need to be fulfilled, e.g.

(3) mask = (df['A']=='a') | (df['A']=='b') | (df['A']=='c') | (df['A']=='d') | ...

Now, say I have my filtering criteria in an array as

(4) filter = ['a', 'b', 'c', 'd', ...]
    # ... here means a lot of other criteria

Is there a way to get the same result as in (3) above, using a one-liner?

Something like:

(5) mask = df.where(df['A']==filter)
    df_new = df[mask]

In this case (5) obviously returns an error.

like image 987
gussilago Avatar asked Aug 21 '14 09:08

gussilago


People also ask

How do you filter a DataFrame in multiple conditions?

Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.

How do you select rows of pandas DataFrame using multiple conditions?

To select the rows based on mutiple condition we can use the & operator.In this example we have passed mutiple conditon using this code dfobj. loc[(dobj['Name'] == 'Rack') & (dobj['Marks'] == 100)]. This code will return a subset of dataframe rows where name='Rack' and marks =100.

How do you select rows of pandas DataFrame based on values in a list?

isin() to Select Rows From List of Values. DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.


1 Answers

I would use Series.isin():

filter = ['a', 'b', 'c', 'd']
df_new = df[df["A"].isin(filter)]

df_new is a DataFrame with rows in which the entry of df["A"] appears in filter.

like image 67
Alex Riley Avatar answered Oct 13 '22 05:10

Alex Riley