Here is a simplified example of my df:
ds = pd.DataFrame(np.abs(randn(3, 4)), index=[1,2,3], columns=['A','B','C','D']) ds       A         B         C         D 1  1.099679  0.042043  0.083903  0.410128 2  0.268205  0.718933  1.459374  0.758887 3  0.680566  0.538655  0.038236  1.169403   I would like to sum the data in the columns row wise:
ds['sum']=ds.sum(axis=1) ds       A         B         C         D       sum 1  0.095389  0.556978  1.646888  1.959295  4.258550 2  1.076190  2.668270  0.825116  1.477040  6.046616 3  0.245034  1.066285  0.967124  0.791606  3.070049   Now, here comes my question! I would like to create 4 new columns and calculate the percentage value from the total (sum) in every row. So first value in the first new column should be (0.095389/4.258550), first value in the second new column (0.556978/4.258550)...and so on... Help please
Create a new column by assigning the output to the DataFrame with a new column name in between the [] . Operations are element-wise, no need to loop over rows. Use rename with a dictionary or function to rename row labels or column names.
Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.
Practical Data Science using Python To add anew column with constant value, use the square bracket i.e. the index operator and set that value.
You can do this easily manually for each column like this:
df['A_perc'] = df['A']/df['sum']   If you want to do this in one step for all columns, you can use the div method (http://pandas.pydata.org/pandas-docs/stable/basics.html#matching-broadcasting-behavior):
ds.div(ds['sum'], axis=0)   And if you want this in one step added to the same dataframe:
>>> ds.join(ds.div(ds['sum'], axis=0), rsuffix='_perc')           A         B         C         D       sum    A_perc    B_perc  \ 1  0.151722  0.935917  1.033526  0.941962  3.063127  0.049532  0.305543    2  0.033761  1.087302  1.110695  1.401260  3.633017  0.009293  0.299283    3  0.761368  0.484268  0.026837  1.276130  2.548603  0.298739  0.190013          C_perc    D_perc  sum_perc   1  0.337409  0.307517         1   2  0.305722  0.385701         1   3  0.010530  0.500718         1   
                        In [56]: df = pd.DataFrame(np.abs(randn(3, 4)), index=[1,2,3], columns=['A','B','C','D'])  In [57]: df.divide(df.sum(axis=1), axis=0) Out[57]:            A         B         C         D 1  0.319124  0.296653  0.138206  0.246017 2  0.376994  0.326481  0.230464  0.066062 3  0.036134  0.192954  0.430341  0.340571 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With