I am grouping information from an Excel file with Pandas, with this information i am summing the values in order to get a summary of the data.
It's pulling certain info from an excel, then grouping columns in order to reflect the summary i want.
summary_df = df.groupby(["NAME", "CITY"])["QUANTITY"].sum().reset_index()
summary_df.loc["Grand Total"] = summary_df["QUANTITY"].sum()
This is returning the information I want AND giving me the summed total but it is showing the quantity summary on each of the columns for the dataframe like this:
NAME | CITY | QUANTITY
JOHN | LONDON | 50
STEVE | PARIS | 100
GRAND TOTAL | 150 | 150
I only want to see the grand total under QUANTITY and no other columns.
You can add your total row to your groupby series and then convert to a dataframe via reset_index. Currently, you are populating the entire row of a dataframe with a scalar, which triggers Pandas to use broadcasting.
df = pd.DataFrame([['A', 'LONDON', 10], ['A', 'LONDON', 20],
['B', 'CHICAGO', 30], ['C', 'CHICAGO', 20]],
columns=['NAME', 'CITY', 'QUANTITY'])
df_summary = df.groupby(['NAME', 'CITY'])['QUANTITY'].sum()
df_summary.loc['Grand Total'] = df_summary.sum()
df_summary = df_summary.reset_index()
print(df_summary)
NAME CITY QUANTITY
0 A LONDON 30
1 B CHICAGO 30
2 C CHICAGO 20
3 Grand Total 80
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With