While I sum a DataFrame, it returns a Series:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([[1, 2, 3], [2, 3, 3]], columns=['a', 'b', 'c'])
In [3]: df
Out[3]:
a b c
0 1 2 3
1 2 3 3
In [4]: s = df.sum()
In [5]: type(s)
Out[5]: pandas.core.series.Series
I know I can construct a new DataFrame by this Series. But, is there any more "pandasic" way?
Pandas DataFrame sum() MethodThe sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
sum() Method to Find the Sum Ignoring NaN Values. Use the default value of the skipna parameter i.e. skipna=True to find the sum of DataFrame along the specified axis, ignoring NaN values. If you set skipna=True , you'll get NaN values of sums if the DataFrame has NaN values.
To sum all the rows of a DataFrame, use the sum() function and set the axis value as 1. The value axis 1 will add the row values.
Select Cell Value from DataFrame Using df['col_name']. values[] We can use df['col_name']. values[] to get 1×1 DataFrame as a NumPy array, then access the first and only value of that array to get a cell value, for instance, df["Duration"].
I'm going to go ahead and say... "No", I don't think there is a direct way to do it, the pandastic way (and pythonic too) is to be explicit:
pd.DataFrame(df.sum(), columns=['sum'])
or, more elegantly, using a dictionary (be aware that this copies the summed array):
pd.DataFrame({'sum': df.sum()})
As @root notes it's faster to use:
pd.DataFrame(np.sum(df.values, axis=0), columns=['sum'])
(As the zen of python states: "practicality beats purity", so if you care about this time, use this).
However, perhaps the most pandastic way is to just use the Series! :)
.
Some %timeits for your tiny example:
In [11]: %timeit pd.DataFrame(df.sum(), columns=['sum'])
1000 loops, best of 3: 356 us per loop
In [12]: %timeit pd.DataFrame({'sum': df.sum()})
1000 loops, best of 3: 462 us per loop
In [13]: %timeit pd.DataFrame(np.sum(df.values, axis=0), columns=['sum'])
1000 loops, best of 3: 205 us per loop
and for a slightly larger one:
In [21]: df = pd.DataFrame(np.random.randn(100000, 3), columns=list('abc'))
In [22]: %timeit pd.DataFrame(df.sum(), columns=['sum'])
100 loops, best of 3: 7.99 ms per loop
In [23]: %timeit pd.DataFrame({'sum': df.sum()})
100 loops, best of 3: 8.3 ms per loop
In [24]: %timeit pd.DataFrame(np.sum(df.values, axis=0), columns=['sum'])
100 loops, best of 3: 2.47 ms per loop
Often it is necessary not only to convert the sum of the columns into a dataframe, but also to transpose the resulting dataframe. There is also a method for this:
df.sum().to_frame().transpose()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With