Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why pandas.DataFrame.sum(axis=0) returns sum of values in each column where axis =0 represent rows?

In pandas, axis=0 represent rows and axis=1 represent columns. Therefore to get the sum of values in each row in pandas, df.sum(axis=0) is called. But it returns a sum of values in each columns and vice-versa. Why???

import pandas as pd
df=pd.DataFrame({"x":[1,2,3,4,5],"y":[2,4,6,8,10]})
df.sum(axis=0)

Dataframe:

   x   y
0  1   2

1  2   4

2  3   6

3  4   8

4  5  10

Output:

x    15

y    30

Expected Output:

0     3

1     6

2     9

3    12

4    15
like image 339
Raja Sekhar Avatar asked May 09 '20 01:05

Raja Sekhar


2 Answers

I think the right way to interpret the axis parameter is what axis you sum 'over' (or 'across'), rather than the 'direction' the sum is computed in. Specifying axis = 0 computes the sum over the rows, giving you a total for each column; axis = 1 computes the sum across the columns, giving you a total for each row.

like image 60
anant Avatar answered Sep 22 '22 16:09

anant


I was a reading the source code in pandas project, and I think that this come from Numpy, in this library is used in that way(0 sum vertically and 1 horizonally), and additionally Pandas use under the hood numpy in order to make this sum.

In this link you could check that pandas use numpy.cumsum function to make the sum. And this link is for numpy documentation.

If you are looking a way to remember how to use the axis parameter, the 'anant' answer, its a good approach, interpreting the sum over the axis instead across. So when is specified 0 you are computing the sum over the rows(iterating over the index in order to be more pandas doc complaint). When axis is 1 you are iterating over the columns.

like image 43
Ivan Terreno Avatar answered Sep 22 '22 16:09

Ivan Terreno