I have the following pandas DataFrame.
import pandas as pd
df = pd.read_csv('filename.csv')
print(df)
dog A B C
0 dog1 0.787575 0.159330 0.053095
1 dog10 0.770698 0.169487 0.059815
2 dog11 0.792689 0.152043 0.055268
3 dog12 0.785066 0.160361 0.054573
4 dog13 0.795455 0.150464 0.054081
5 dog14 0.794873 0.150700 0.054426
.. ....
8 dog19 0.811585 0.140207 0.048208
9 dog2 0.797202 0.152033 0.050765
10 dog20 0.801607 0.145137 0.053256
11 dog21 0.792689 0.152043 0.055268
....
I create a new column by summing columns "A"
, "B"
, "C"
as follows:
df['total_ABC'] = df[["A", "B", "B"]].sum(axis=1)
Now I would like to do this based on a conditional, i.e. if "A" < 0.78
then create a new summed column df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1)
. Otherwise, the value should be zero.
How does one create conditional statements like this?
My thought would be to use
df['smallA_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if row['A'] < 0.78))
However, this doesn't work and I'm not able to specify axis.
How do you create a column based on the values of other columns?
You could also do something like for each df['dog'] == 'dog2'
, create column dog2_sum
, i.e.
df['dog2_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if df['dog'] == 'dog2'))
but my approach is incorrect.
The following should work, here we mask the df where the condition is met, this will set NaN
to the rows where the condition isn't met so we call fillna
on the new col:
In [67]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df
Out[67]:
A B C
0 0.197334 0.707852 -0.443475
1 -1.063765 -0.914877 1.585882
2 0.899477 1.064308 1.426789
3 -0.556486 -0.150080 -0.149494
4 -0.035858 0.777523 -0.453747
In [73]:
df['total'] = df.loc[df['A'] > 0,['A','B']].sum(axis=1)
df['total'].fillna(0, inplace=True)
df
Out[73]:
A B C total
0 0.197334 0.707852 -0.443475 0.905186
1 -1.063765 -0.914877 1.585882 0.000000
2 0.899477 1.064308 1.426789 1.963785
3 -0.556486 -0.150080 -0.149494 0.000000
4 -0.035858 0.777523 -0.453747 0.000000
Another approach is to call where
on the sum
result, this takes a value param to return when the condition isn't met:
In [75]:
df['total'] = df[['A','B']].sum(axis=1).where(df['A'] > 0, 0)
df
Out[75]:
A B C total
0 0.197334 0.707852 -0.443475 0.905186
1 -1.063765 -0.914877 1.585882 0.000000
2 0.899477 1.064308 1.426789 1.963785
3 -0.556486 -0.150080 -0.149494 0.000000
4 -0.035858 0.777523 -0.453747 0.000000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With