How do I create a sum row and sum column in pandas?

Tags:

pandas

I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.

I've got a table that looks like this from Khan Academy:

             | Undergraduate | Graduate | Total -------------+---------------+----------+------ Straight A's |           240 |       60 |   300 -------------+---------------+----------+------ Not          |         3,760 |      440 | 4,200 -------------+---------------+----------+------ Total        |         4,000 |      500 | 4,500

I would like to recreate this table using pandas. Of course I could create a DataFrame using something like

"Graduate": {...}, "Undergraduate": {...}, "Total": {...},

But that seems like a naive approach that would both fall over quickly and just not really be extensible.

I've got the non-totals part of the table like this:

df = pd.DataFrame(     {         "Undergraduate": {"Straight A's": 240, "Not": 3_760},         "Graduate": {"Straight A's": 60, "Not": 440},     } ) df

I've been looking and found a couple of promising things, like:

df['Total'] = df.sum(axis=1)

But I didn't find anything terribly elegant.

I did find the crosstab function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.

I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:

totals(df, rows=True, columns=True)

or something.

Does this exist in pandas, or do I have to just cobble together my own approach?

591

asked Nov 21 '18 15:11

Wayne Werner

1 Answers

Or in two steps, using the .sum() function as you suggested (which might be a bit more readable as well):

import pandas as pd  df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})  #Total sum per column:  df.loc['Total',:]= df.sum(axis=0)  #Total sum per row:  df.loc[:,'Total'] = df.sum(axis=1)

Output:

              Graduate  Undergraduate  Total Not                440           3760   4200 Straight A's        60            240    300 Total              500           4000   4500

101

answered Sep 20 '22 06:09

Archie

Related questions
                            
                                In Python, how can I test if I'm in Google App Engine SDK?
                            
                                What exception is thrown when key is not found in Python dictionary?
                            
                                How to disallow pickle serialization in celery
                            
                                What is :: (double colon) in numpy like in myarray[0::3]? [duplicate]
                            
                                Python: How can I execute a jar file through a python script
                            
                                How to change folder names in python?
                            
                                What are the Spark transformations that causes a Shuffle?
                            
                                How to determine if an exception was raised once you're in the finally block?
                            
                                Tracing and Returning a Path in Depth First Search
                            
                                Is the PySide Slot Decorator Necessary?
                            
                                When using a pandas dataframe, how do I add column if does not exist?
                            
                                How to run a Python script in a '.py' file from a Google Colab notebook?
                            
                                TypeError: string argument without an encoding
                            
                                Extracting code from photograph of T-shirt via OCR
                            
                                django-cors-headers not work
                            
                                Absolute value for column in Python
                            
                                Complete set of punctuation marks for Python (not just ASCII)
                            
                                python subclass access to class variable of parent
                            
                                Lexical cast from string to type
                            
                                Putting arrowheads on vectors in matplotlib's 3d plot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With