I have the following data frame:
import pandas as pd
df = pd.DataFrame({'AAA' : ['w','x','y','z'], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
Which looks like this:
In [32]: df
Out[32]:
AAA BBB CCC
0 w 10 100
1 x 20 50
2 y 30 -30
3 z 40 -50
What I want to do is to perform function operation on every row for every column except those with non-numerical value (in this case AAA
). In the real case the non-numerical case is always on first column, and the rest (could be greater than 2 columns) are always numerical.
The final desired output is:
AAA BBB CCC Score
0 w 10 100 110
1 x 20 50 70
2 y 30 -30 0
3 z 40 -50 -10
I tried this but failed:
import numpy as np
df["Score"] = df.apply(np.sum, axis=1)
What's the right way to do it?
Update2:
This is the code that give SettingWithCopyWarning
.
Please fresh start the ipython for testing.
import pandas as pd
import numpy as np
def cvscore(fclist):
sd = np.std(fclist)
mean = np.mean(fclist)
cv = sd/mean
return cv
def calc_cvscore_on_df(df):
df["CV"] = df.iloc[:,1:].apply(cvscore, axis=1)
return df
df3 = pd.DataFrame(np.random.randn(1000, 3), columns=['a', 'b', 'c'])
calc_cvscore_on_df(df3[["a","b"]])
In order to apply a function to every row, you should use axis=1 param to apply(), default it uses axis=0 meaning it applies a function to each column. By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.
In some cases we would want to apply a function on all pandas columns, you can do this using apply() function. Here the add_3() function will be applied to all DataFrame columns.
One can use apply() function in order to apply function to every row in given dataframe.
To select everything but the first column, you could use df.iloc[:, 1:]
:
In [371]: df['Score'] = df.iloc[:, 1:].sum(axis=1)
In [372]: df
Out[372]:
AAA BBB CCC Score
0 w 10 100 110
1 x 20 50 70
2 y 30 -30 0
3 z 40 -50 -10
To apply an arbitrary function, func
, to each row:
df.iloc[:, 1:].apply(func, axis=1)
For example,
import numpy as np
import pandas as pd
def cvscore(fclist):
sd = np.std(fclist)
mean = np.mean(fclist)
cv = sd/mean
return cv
df = pd.DataFrame({'AAA' : ['w','x','y','z'], 'BBB' : [10,20,30,40],
'CCC' : [100,50,-30,-50]})
df['Score'] = df.iloc[:, 1:].apply(cvscore, axis=1)
print(df)
yields
AAA BBB CCC Score
0 w 10 100 1.211386
1 x 20 50 0.868377
2 y 30 -30 NaN
3 z 40 -50 -5.809058
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With