I have a dataset with various columns as below:
discount tax total subtotal productid
3.98 1.06 21.06 20 3232
3.98 1.06 21.06 20 3232
3.98 6 106 100 3498
3.98 6 106 100 3743
3.98 6 106 100 3350
3.98 6 106 100 3370
46.49 3.36 66.84 63 695
Now, I need to add a new column Class and assign it the value of 0
or 1
on the base of the following conditions:
if:
discount > 20%
no tax
total > 100
then the Class will 1
otherwise it should be 0
I have done it with a single condition but I don't how can I accomplish it under multiple conditions.
Here's wIat i have tried:
df_full['Class'] = df_full['amount'].map(lambda x: 1 if x > 100 else 0)
I have taken a look at all other similar questions but couldn't find any solution for my problem.I have tried all of the above-mentioned posts but stuck on this error:
TypeError: '>' not supported between instances of 'str' and 'int'
Here's in the case of first posted answer, i have tried it as:
df_full['class'] = np.where( ( (df_full['discount'] > 20) & (df_full['tax'] == 0 ) & (df_full['total'] > 100) & df_full['productdiscount'] ) , 1, 0)
Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.
Method 1 : Using dataframe. With this method, we can access a group of rows or columns with a condition or a boolean array. If we can access it we can also manipulate the values, Yes! this is our first method by the dataframe. loc[] function in pandas we can access a column and change its values with a condition.
Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.
You can apply an arbitrary function across a dataframe row using DataFrame.apply
.
In your case, you could define a function like:
def conditions(s):
if (s['discount'] > 20) or (s['tax'] == 0) or (s['total'] > 100):
return 1
else:
return 0
And use it to add a new column to your data:
df_full['Class'] = df_full.apply(conditions, axis=1)
Judging by the image of your data is rather unclear what you mean by a discount
20%.
However, you can likely do something like this.
df['class'] = 0 # add a class column with 0 as default value
# find all rows that fulfills your conditions and set class to 1
df.loc[(df['discount'] / df['total'] > .2) & # if discount is more than .2 of total
(df['tax'] == 0) & # if tax is 0
(df['total'] > 100), # if total is > 100
'class'] = 1 # then set class to 1
Note that &
means and
here, if you want or
instead use |
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With