sum() to Sum All Columns. Use DataFrame. sum() to get sum/total of a DataFrame for both rows and columns, to get the total sum of columns use axis=1 param. By default, this method takes axis=0 which means summing of rows.
The second method to divide two columns is using the div() method. It divides the columns elementwise. It accepts a scalar value, series, or dataframe as an argument for dividing with the axis. If the axis is 0 the division is done row-wise and if the axis is 1 then division is done column-wise.
Sum all columns in a Pandas DataFrame into new column If we want to summarize all the columns, then we can simply use the DataFrame sum() method.
More simply:
result.div(result.sum(axis=1), axis=0)
(Edited to use code highlighting)
Try the following:
In [1]: import pandas as pd
In [2]: df = pd.read_csv("test.csv")
In [3]: df
Out[3]:
id value1 value2 value3
0 A 1 2 3
1 B 4 5 6
2 C 7 8 9
In [4]: df["sum"] = df.sum(axis=1)
In [5]: df
Out[5]:
id value1 value2 value3 sum
0 A 1 2 3 6
1 B 4 5 6 15
2 C 7 8 9 24
In [6]: df_new = df.loc[:,"value1":"value3"].div(df["sum"], axis=0)
In [7]: df_new
Out[7]:
value1 value2 value3
0 0.166667 0.333333 0.500
1 0.266667 0.333333 0.400
2 0.291667 0.333333 0.375
Or you can do the following:
In [8]: df.loc[:,"value1":"value3"] = df.loc[:,"value1":"value3"].div(df["sum"], axis=0)
In [9]: df
Out[9]:
id value1 value2 value3 sum
0 A 0.166667 0.333333 0.500 6
1 B 0.266667 0.333333 0.400 15
2 C 0.291667 0.333333 0.375 24
Or just straight up from the beginning:
In [10]: df = pd.read_csv("test.csv")
In [11]: df
Out[11]:
id value1 value2 value3
0 A 1 2 3
1 B 4 5 6
2 C 7 8 9
In [12]: df.loc[:,"value1":"value3"] = df.loc[:,"value1":"value3"].div(df.sum(axis=1), axis=0)
In [13]: df
Out[13]:
id value1 value2 value3
0 A 0.166667 0.333333 0.500
1 B 0.266667 0.333333 0.400
2 C 0.291667 0.333333 0.375
Changing the column value1
and the like to your headers should work similarly.
easier to work per column:
df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]])
(df.T / df.T.sum()).T
result:
0 1 2
0 0.166667 0.333333 0.500
1 0.266667 0.333333 0.400
2 0.291667 0.333333 0.375
The following seemed to work fine for me:
In [39]:
cols = ['0','1','2','3','4','5','6','7','8','9','10','11','12','13']
result[cols] = result[cols].apply(lambda row: row / row.sum(axis=1), axis=1)
result
Out[39]:
0 1 2 3 4 5 6 \
user_id
2 0.864827 0.059749 0.023540 0.018503 0.022280 0.004806 0.000797
4 0.837285 0.018345 0.049453 0.025258 0.052732 0.002437 0.004077
16 0.912269 0.046174 0.017810 0.011214 0.011214 0.000660 0.000000
50 0.754286 0.137143 0.064762 0.009524 0.034286 0.000000 0.000000
51 0.401868 0.120099 0.041265 0.085403 0.286491 0.032437 0.001232
7 8 9 10 11 12 13 \
user_id
2 0.000000 0.000154 0.000077 0.001079 0.001439 0.002364 0.000385
4 0.000000 0.005406 0.001019 0.003456 0.000266 0.000177 0.000089
16 0.000000 0.000000 0.000000 0.000660 0.000000 0.000000 0.000000
50 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
51 0.000513 0.012113 0.005235 0.003285 0.000924 0.006364 0.002772
group
user_id
2 l-1
4 l-2
16 l-2
50 l-3
51 l-4
OK scratch the above, the following will be much faster:
result[cols] = result[cols].div(result[cols].sum(axis=1), axis=0)
And just to prove the result is the same:
In [47]:
cols = ['0','1','2','3','4','5','6','7','8','9','10','11','12','13']
np.alltrue(result[cols].div(result[cols].sum(axis=1), axis=0) == result[cols].apply(lambda row: row / row.sum(axis=1), axis=1))
Out[47]:
True
And that it's faster:
In [48]:
cols = ['0','1','2','3','4','5','6','7','8','9','10','11','12','13']
%timeit result[cols].div(result[cols].sum(axis=1), axis=0)
%timeit result[cols].apply(lambda row: row / row.sum(axis=1), axis=1)
100 loops, best of 3: 2.38 ms per loop
100 loops, best of 3: 4.47 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With