Here is a simplified example of my df:
ds = pd.DataFrame(np.abs(randn(3, 4)), index=[1,2,3], columns=['A','B','C','D']) ds A B C D 1 1.099679 0.042043 0.083903 0.410128 2 0.268205 0.718933 1.459374 0.758887 3 0.680566 0.538655 0.038236 1.169403
I would like to sum the data in the columns row wise:
ds['sum']=ds.sum(axis=1) ds A B C D sum 1 0.095389 0.556978 1.646888 1.959295 4.258550 2 1.076190 2.668270 0.825116 1.477040 6.046616 3 0.245034 1.066285 0.967124 0.791606 3.070049
Now, here comes my question! I would like to create 4 new columns and calculate the percentage value from the total (sum) in every row. So first value in the first new column should be (0.095389/4.258550), first value in the second new column (0.556978/4.258550)...and so on... Help please
Create a new column by assigning the output to the DataFrame with a new column name in between the [] . Operations are element-wise, no need to loop over rows. Use rename with a dictionary or function to rename row labels or column names.
Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.
Practical Data Science using Python To add anew column with constant value, use the square bracket i.e. the index operator and set that value.
You can do this easily manually for each column like this:
df['A_perc'] = df['A']/df['sum']
If you want to do this in one step for all columns, you can use the div
method (http://pandas.pydata.org/pandas-docs/stable/basics.html#matching-broadcasting-behavior):
ds.div(ds['sum'], axis=0)
And if you want this in one step added to the same dataframe:
>>> ds.join(ds.div(ds['sum'], axis=0), rsuffix='_perc') A B C D sum A_perc B_perc \ 1 0.151722 0.935917 1.033526 0.941962 3.063127 0.049532 0.305543 2 0.033761 1.087302 1.110695 1.401260 3.633017 0.009293 0.299283 3 0.761368 0.484268 0.026837 1.276130 2.548603 0.298739 0.190013 C_perc D_perc sum_perc 1 0.337409 0.307517 1 2 0.305722 0.385701 1 3 0.010530 0.500718 1
In [56]: df = pd.DataFrame(np.abs(randn(3, 4)), index=[1,2,3], columns=['A','B','C','D']) In [57]: df.divide(df.sum(axis=1), axis=0) Out[57]: A B C D 1 0.319124 0.296653 0.138206 0.246017 2 0.376994 0.326481 0.230464 0.066062 3 0.036134 0.192954 0.430341 0.340571
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With