In pandas, how can I get a DataFrame as the output while I sum the DataFrame

Tags:

While I sum a DataFrame, it returns a Series:

In [1]: import pandas as pd
In [2]: df = pd.DataFrame([[1, 2, 3], [2, 3, 3]], columns=['a', 'b', 'c'])

In [3]: df
Out[3]: 
      a  b  c
   0  1  2  3
   1  2  3  3

   In [4]: s = df.sum()

   In [5]: type(s)
   Out[5]: pandas.core.series.Series

I know I can construct a new DataFrame by this Series. But, is there any more "pandasic" way?

461

asked May 09 '13 10:05

waitingkuo

2 Answers

I'm going to go ahead and say... "No", I don't think there is a direct way to do it, the pandastic way (and pythonic too) is to be explicit:

pd.DataFrame(df.sum(), columns=['sum'])

or, more elegantly, using a dictionary (be aware that this copies the summed array):

pd.DataFrame({'sum': df.sum()})

As @root notes it's faster to use:

pd.DataFrame(np.sum(df.values, axis=0), columns=['sum'])

(As the zen of python states: "practicality beats purity", so if you care about this time, use this).

However, perhaps the most pandastic way is to just use the Series! :)

Some %timeits for your tiny example:

In [11]: %timeit pd.DataFrame(df.sum(), columns=['sum'])
1000 loops, best of 3: 356 us per loop

In [12]: %timeit pd.DataFrame({'sum': df.sum()})
1000 loops, best of 3: 462 us per loop

In [13]: %timeit  pd.DataFrame(np.sum(df.values, axis=0), columns=['sum'])
1000 loops, best of 3: 205 us per loop

and for a slightly larger one:

In [21]: df = pd.DataFrame(np.random.randn(100000, 3), columns=list('abc'))

In [22]: %timeit pd.DataFrame(df.sum(), columns=['sum'])
100 loops, best of 3: 7.99 ms per loop

In [23]: %timeit pd.DataFrame({'sum': df.sum()})
100 loops, best of 3: 8.3 ms per loop

In [24]: %timeit  pd.DataFrame(np.sum(df.values, axis=0), columns=['sum'])
100 loops, best of 3: 2.47 ms per loop

140

answered Sep 22 '22 05:09

Andy Hayden

Often it is necessary not only to convert the sum of the columns into a dataframe, but also to transpose the resulting dataframe. There is also a method for this:

df.sum().to_frame().transpose()

answered Sep 21 '22 05:09

Plo_Koon

Related questions
                            
                                Get mouse deltas using Python! (in Linux)
                            
                                How can I work with Gzip files which contain extra data?
                            
                                What are the default URLs for Django's User Authentication system?
                            
                                Preserve end-of-line style when working with files in python
                            
                                Sphinx, the best practices
                            
                                Uninstantiable superclass
                            
                                Download a specific email from Gmail using Python
                            
                                create_string_buffer throwing error TypeError: str/bytes expected instead of str instance
                            
                                What happens if two python scripts want to write in the same file?
                            
                                How to populate my WTForm variables?
                            
                                Symbol Table in Python
                            
                                What the equivalent for 'gem' in python? [duplicate]
                            
                                Python equivalent of Java's compareTo()
                            
                                Complex matlab-like data structure in python (numpy/scipy)
                            
                                Python iteration over non-sequence
                            
                                How to use numpy with OpenBLAS instead of Atlas in Ubuntu?
                            
                                Python pytz: convert local time to utc. Localize doesn't seem to convert
                            
                                How can I save the code written in an IPython session? [duplicate]
                            
                                How does __slots__ avoid a dictionary lookup?
                            
                                cron job doesn't output to nohup.out

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In pandas, how can I get a DataFrame as the output while I sum the DataFrame

Tags:

python

pandas

dataframe

waitingkuo

People also ask

2 Answers

Andy Hayden

Plo_Koon

Recent Activity

Donate For Us