Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add new column to Python Pandas DataFrame based on multiple conditions [duplicate]

I have a dataset with various columns as below:

discount tax total subtotal productid 3.98 1.06 21.06 20 3232 3.98 1.06 21.06 20 3232 3.98 6 106 100 3498 3.98 6 106 100 3743 3.98 6 106 100 3350 3.98 6 106 100 3370 46.49 3.36 66.84 63 695

Now, I need to add a new column Class and assign it the value of 0 or 1 on the base of the following conditions:

if:
    discount > 20%
    no tax
    total > 100
then the Class will 1
otherwise it should be 0

I have done it with a single condition but I don't how can I accomplish it under multiple conditions.

Here's wIat i have tried:

df_full['Class'] = df_full['amount'].map(lambda x: 1 if x > 100 else 0)

I have taken a look at all other similar questions but couldn't find any solution for my problem.I have tried all of the above-mentioned posts but stuck on this error:

TypeError: '>' not supported between instances of 'str' and 'int'

Here's in the case of first posted answer, i have tried it as:

df_full['class'] = np.where( ( (df_full['discount'] > 20) & (df_full['tax'] == 0 ) & (df_full['total'] > 100) & df_full['productdiscount'] ) , 1, 0)
like image 631
Abdul Rehman Avatar asked Mar 31 '18 09:03

Abdul Rehman


People also ask

How do you create a new column in pandas using values from other columns?

Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.

How do you update a DataFrame column based on condition?

Method 1 : Using dataframe. With this method, we can access a group of rows or columns with a condition or a boolean array. If we can access it we can also manipulate the values, Yes! this is our first method by the dataframe. loc[] function in pandas we can access a column and change its values with a condition.

How do I use multiple conditions in pandas?

Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.


2 Answers

You can apply an arbitrary function across a dataframe row using DataFrame.apply.

In your case, you could define a function like:

def conditions(s):
    if (s['discount'] > 20) or (s['tax'] == 0) or (s['total'] > 100):
        return 1
    else:
        return 0

And use it to add a new column to your data:

df_full['Class'] = df_full.apply(conditions, axis=1)
like image 183
Gustavo Bezerra Avatar answered Sep 24 '22 21:09

Gustavo Bezerra


Judging by the image of your data is rather unclear what you mean by a discount 20%.

However, you can likely do something like this.

df['class'] = 0 # add a class column with 0 as default value

# find all rows that fulfills your conditions and set class to 1
df.loc[(df['discount'] / df['total'] > .2) & # if discount is more than .2 of total 
       (df['tax'] == 0) & # if tax is 0
       (df['total'] > 100), # if total is > 100 
       'class'] = 1 # then set class to 1

Note that & means and here, if you want or instead use |.

like image 45
Karl Anka Avatar answered Sep 22 '22 21:09

Karl Anka