I'm going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.
I've got a table that looks like this from Khan Academy:
| Undergraduate | Graduate | Total -------------+---------------+----------+------ Straight A's | 240 | 60 | 300 -------------+---------------+----------+------ Not | 3,760 | 440 | 4,200 -------------+---------------+----------+------ Total | 4,000 | 500 | 4,500
I would like to recreate this table using pandas. Of course I could create a DataFrame using something like
"Graduate": {...}, "Undergraduate": {...}, "Total": {...},
But that seems like a naive approach that would both fall over quickly and just not really be extensible.
I've got the non-totals part of the table like this:
df = pd.DataFrame( { "Undergraduate": {"Straight A's": 240, "Not": 3_760}, "Graduate": {"Straight A's": 60, "Not": 440}, } ) df
I've been looking and found a couple of promising things, like:
df['Total'] = df.sum(axis=1)
But I didn't find anything terribly elegant.
I did find the crosstab
function that looks like it should do what I want, but it seems like in order to do that I'd have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I've already got an aggregate.
I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:
totals(df, rows=True, columns=True)
or something.
Does this exist in pandas, or do I have to just cobble together my own approach?
The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
To sum all the rows of a DataFrame, use the sum() function and set the axis value as 1. The value axis 1 will add the row values.
Sum all columns in a Pandas DataFrame into new column If we want to summarize all the columns, then we can simply use the DataFrame sum() method.
You may use the following syntax to sum each column and row in Pandas DataFrame: 1 (1) Sum each column:#N#df.sum (axis=0) 2 (2) Sum each row: More ...
The following code shows how to sum the values of the rows across all columns in the DataFrame: The sum_stats column contains the sum of the row values across all columns. And so on. The following code shows how to sum the values of the rows across all columns in the DataFrame:
The sum_stats column contains the sum of the row values across all columns. And so on. The following code shows how to sum the values of the rows across all columns in the DataFrame:
It shows that our example data consists of eight rows and three columns called “x1”, “x2”, and “x3”. The following Python code explains how to get the column sums of all numeric variables of a pandas DataFrame in Python.
Or in two steps, using the .sum()
function as you suggested (which might be a bit more readable as well):
import pandas as pd df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},}) #Total sum per column: df.loc['Total',:]= df.sum(axis=0) #Total sum per row: df.loc[:,'Total'] = df.sum(axis=1)
Output:
Graduate Undergraduate Total Not 440 3760 4200 Straight A's 60 240 300 Total 500 4000 4500
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With