I have a dataframe and I'm trying to sum two rows without messing up the order of the rows.
> test = {'counts' : pd.Series([10541,4143,736,18,45690], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total']), 'percents' : pd.Series([23.07,9.07,1.61,0.04,100], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total'])}
> testdf = pd.DataFrame(test)
counts percents
Daylight 10541 23.07
Dawn 4143 9.07
Other / unknown 736 1.61
Uncoded & errors 18 0.04
Total 45690 100.00
I want this output:
counts percents
Daylight 10541 23.07
Dawn 4143 9.07
Other / unknown 754 1.65 <-- sum of 'other/unknown' and 'uncoded & errors'
Total 45690 100.00
This is as close as I've been able to get:
> sum_ = testdf.loc[['Other / unknown', 'Uncoded & errors']].sum().to_frame().transpose()
counts percents
0 754.00 1.65
> sum_ = sum_.rename(index={0: 'Other / unknown'})
counts percents
Other / unknown 754.00 1.65
> testdf.drop(['Other / unknown', 'Uncoded & errors'],inplace=True)
> testdf = testdf.append(sum_)
Daylight 10541 23.07
Dawn 4143 9.07
Total 45690 100
Other / unknown 754 1.65
But this does not preserve the order of the original rows
I could insert the row by slicing the dataframe and inserting the sum_ row between 'Dawn' and 'Total', but that will not work if the row labels ever change, or if the order of the rows change, etc. (this is an annual brochure so the table design might change from year to year), so I'm trying to do this robustly.
To sum all the rows of a DataFrame, use the sum() function and set the axis value as 1. The value axis 1 will add the row values.
To sum only specific rows, use the loc() method. Mention the beginning and end row index using the : operator. Using loc(), you can also set the columns to be included. We can display the result in a new column.
The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
You can add rows to the pandas dataframe using df. iLOC[i] = ['col-1-value', 'col-2-value', ' col-3-value '] statement.
Although I prefer MaxU's answer, you can also try summing in-place:
testdf.loc['Other / unknown'] += testdf.loc['Uncoded & errors']
And then deleting the row by index:
testdf.drop(['Uncoded & errors'], inplace=True)
In [28]: testdf
Out[28]:
counts percents
Daylight 10541 23.07
Dawn 4143 9.07
Other / unknown 754 1.65
Total 45690 100.00
use groupby(..., sort=False).sum()
:
In [84]: (testdf.reset_index()
....: .replace({'index': {'Uncoded & errors':'Other / unknown'}})
....: .groupby('index', sort=False).sum()
....: )
Out[84]:
counts percents
index
Daylight 10541 23.07
Dawn 4143 9.07
Other / unknown 754 1.65
Total 45690 100.00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With