This should be incredibly easy, but I can't get it to work.
I want to filter my dataset on two or more values.
#this works, when I filter for one value df.loc[df['channel'] == 'sale'] #if I have to filter, two separate columns, I can do this df.loc[(df['channel'] == 'sale')&(df['type']=='A')] #but what if I want to filter one column by more than one value? df.loc[df['channel'] == ('sale','fullprice')]
Would this have to be an OR statement? I can do something like in SQL using in?
Pandas DataFrame. duplicated() function is used to get/find/select a list of all duplicate rows(all or selected columns) from pandas. Duplicate rows means, having multiple rows on all columns. Using this method you can get duplicate rows on selected multiple columns or all columns.
to filter one column by multiple values. df. loc[df['channel']. apply(lambda x: x in ['sale','fullprice'])] would also work.
The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
There is a df.isin(values)
method wich tests whether each element in the DataFrame
is contained in values
. So, as @MaxU wrote in the comment, you can use
df.loc[df['channel'].isin(['sale','fullprice'])]
to filter one column by multiple values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With