Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas dataframe create new columns and fill with calculated values from same df

Here is a simplified example of my df:

ds = pd.DataFrame(np.abs(randn(3, 4)), index=[1,2,3], columns=['A','B','C','D']) ds       A         B         C         D 1  1.099679  0.042043  0.083903  0.410128 2  0.268205  0.718933  1.459374  0.758887 3  0.680566  0.538655  0.038236  1.169403 

I would like to sum the data in the columns row wise:

ds['sum']=ds.sum(axis=1) ds       A         B         C         D       sum 1  0.095389  0.556978  1.646888  1.959295  4.258550 2  1.076190  2.668270  0.825116  1.477040  6.046616 3  0.245034  1.066285  0.967124  0.791606  3.070049 

Now, here comes my question! I would like to create 4 new columns and calculate the percentage value from the total (sum) in every row. So first value in the first new column should be (0.095389/4.258550), first value in the second new column (0.556978/4.258550)...and so on... Help please

like image 866
jonas Avatar asked Aug 29 '13 07:08

jonas


People also ask

How will you create a new column whose value is calculated from two other columns?

Create a new column by assigning the output to the DataFrame with a new column name in between the [] . Operations are element-wise, no need to loop over rows. Use rename with a dictionary or function to rename row labels or column names.

How do you create a new column based on values from other columns in pandas?

Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.

How do I create a column with the same value in pandas?

Practical Data Science using Python To add anew column with constant value, use the square bracket i.e. the index operator and set that value.


2 Answers

You can do this easily manually for each column like this:

df['A_perc'] = df['A']/df['sum'] 

If you want to do this in one step for all columns, you can use the div method (http://pandas.pydata.org/pandas-docs/stable/basics.html#matching-broadcasting-behavior):

ds.div(ds['sum'], axis=0) 

And if you want this in one step added to the same dataframe:

>>> ds.join(ds.div(ds['sum'], axis=0), rsuffix='_perc')           A         B         C         D       sum    A_perc    B_perc  \ 1  0.151722  0.935917  1.033526  0.941962  3.063127  0.049532  0.305543    2  0.033761  1.087302  1.110695  1.401260  3.633017  0.009293  0.299283    3  0.761368  0.484268  0.026837  1.276130  2.548603  0.298739  0.190013          C_perc    D_perc  sum_perc   1  0.337409  0.307517         1   2  0.305722  0.385701         1   3  0.010530  0.500718         1   
like image 119
joris Avatar answered Oct 15 '22 08:10

joris


In [56]: df = pd.DataFrame(np.abs(randn(3, 4)), index=[1,2,3], columns=['A','B','C','D'])  In [57]: df.divide(df.sum(axis=1), axis=0) Out[57]:            A         B         C         D 1  0.319124  0.296653  0.138206  0.246017 2  0.376994  0.326481  0.230464  0.066062 3  0.036134  0.192954  0.430341  0.340571 
like image 29
waitingkuo Avatar answered Oct 15 '22 07:10

waitingkuo