While I sum a DataFrame
, it returns a Series
:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([[1, 2, 3], [2, 3, 3]], columns=['a', 'b', 'c'])
In [3]: df
Out[3]:
a b c
0 1 2 3
1 2 3 3
In [4]: s = df.sum()
In [5]: type(s)
Out[5]: pandas.core.series.Series
I know I can construct a new DataFrame
by this Series
. But, is there any more "pandasic" way?
Pandas DataFrame sum() MethodThe sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
sum() Method to Find the Sum Ignoring NaN Values. Use the default value of the skipna parameter i.e. skipna=True to find the sum of DataFrame along the specified axis, ignoring NaN values. If you set skipna=True , you'll get NaN values of sums if the DataFrame has NaN values.
To sum all the rows of a DataFrame, use the sum() function and set the axis value as 1. The value axis 1 will add the row values.
Select Cell Value from DataFrame Using df['col_name']. values[] We can use df['col_name']. values[] to get 1×1 DataFrame as a NumPy array, then access the first and only value of that array to get a cell value, for instance, df["Duration"].
I'm going to go ahead and say... "No", I don't think there is a direct way to do it, the pandastic way (and pythonic too) is to be explicit:
pd.DataFrame(df.sum(), columns=['sum'])
or, more elegantly, using a dictionary (be aware that this copies the summed array):
pd.DataFrame({'sum': df.sum()})
As @root notes it's faster to use:
pd.DataFrame(np.sum(df.values, axis=0), columns=['sum'])
(As the zen of python states: "practicality beats purity", so if you care about this time, use this).
However, perhaps the most pandastic way is to just use the Series! :)
.
Some %timeit
s for your tiny example:
In [11]: %timeit pd.DataFrame(df.sum(), columns=['sum'])
1000 loops, best of 3: 356 us per loop
In [12]: %timeit pd.DataFrame({'sum': df.sum()})
1000 loops, best of 3: 462 us per loop
In [13]: %timeit pd.DataFrame(np.sum(df.values, axis=0), columns=['sum'])
1000 loops, best of 3: 205 us per loop
and for a slightly larger one:
In [21]: df = pd.DataFrame(np.random.randn(100000, 3), columns=list('abc'))
In [22]: %timeit pd.DataFrame(df.sum(), columns=['sum'])
100 loops, best of 3: 7.99 ms per loop
In [23]: %timeit pd.DataFrame({'sum': df.sum()})
100 loops, best of 3: 8.3 ms per loop
In [24]: %timeit pd.DataFrame(np.sum(df.values, axis=0), columns=['sum'])
100 loops, best of 3: 2.47 ms per loop
Often it is necessary not only to convert the sum of the columns into a dataframe, but also to transpose the resulting dataframe. There is also a method for this:
df.sum().to_frame().transpose()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With