I know one can mask out certain rows in a data frame using e.g.
(1) mask = df['A']=='a'
where df is the data frame at hand having a column named 'A'. Calling df[mask] yields my new "masked" DataFrame.
One can of course also use multiple criteria with
(2) mask = (df['A']=='a') | (df['A']=='b')
This last step however can get a bit tedious when there are several criteria that need to be fulfilled, e.g.
(3) mask = (df['A']=='a') | (df['A']=='b') | (df['A']=='c') | (df['A']=='d') | ...
Now, say I have my filtering criteria in an array as
(4) filter = ['a', 'b', 'c', 'd', ...]
# ... here means a lot of other criteria
Is there a way to get the same result as in (3) above, using a one-liner?
Something like:
(5) mask = df.where(df['A']==filter)
df_new = df[mask]
In this case (5) obviously returns an error.
Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.
To select the rows based on mutiple condition we can use the & operator.In this example we have passed mutiple conditon using this code dfobj. loc[(dobj['Name'] == 'Rack') & (dobj['Marks'] == 100)]. This code will return a subset of dataframe rows where name='Rack' and marks =100.
isin() to Select Rows From List of Values. DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.
I would use Series.isin()
:
filter = ['a', 'b', 'c', 'd']
df_new = df[df["A"].isin(filter)]
df_new
is a DataFrame with rows in which the entry of df["A"]
appears in filter
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With