I have a dataframe and I'm trying to sum two rows without messing up the order of the rows.
> test = {'counts' : pd.Series([10541,4143,736,18,45690], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total']), 'percents' : pd.Series([23.07,9.07,1.61,0.04,100], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total'])}
> testdf = pd.DataFrame(test)
                  counts  percents
Daylight           10541     23.07
Dawn                4143      9.07
Other / unknown      736      1.61
Uncoded & errors      18      0.04
Total              45690    100.00
I want this output:
                  counts  percents
Daylight           10541     23.07
Dawn                4143      9.07
Other / unknown      754      1.65   <-- sum of 'other/unknown' and 'uncoded & errors'
Total              45690    100.00
This is as close as I've been able to get:
> sum_ = testdf.loc[['Other / unknown', 'Uncoded & errors']].sum().to_frame().transpose()
     counts   percents
0    754.00   1.65       
> sum_ = sum_.rename(index={0: 'Other / unknown'})
                counts   percents
Other / unknown 754.00   1.65   
> testdf.drop(['Other / unknown', 'Uncoded & errors'],inplace=True)
> testdf = testdf.append(sum_)
Daylight         10541  23.07
Dawn             4143   9.07
Total            45690  100
Other / unknown  754    1.65
But this does not preserve the order of the original rows
I could insert the row by slicing the dataframe and inserting the sum_ row between 'Dawn' and 'Total', but that will not work if the row labels ever change, or if the order of the rows change, etc. (this is an annual brochure so the table design might change from year to year), so I'm trying to do this robustly.
To sum all the rows of a DataFrame, use the sum() function and set the axis value as 1. The value axis 1 will add the row values.
To sum only specific rows, use the loc() method. Mention the beginning and end row index using the : operator. Using loc(), you can also set the columns to be included. We can display the result in a new column.
The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
You can add rows to the pandas dataframe using df. iLOC[i] = ['col-1-value', 'col-2-value', ' col-3-value '] statement.
Although I prefer MaxU's answer, you can also try summing in-place:
testdf.loc['Other / unknown'] += testdf.loc['Uncoded & errors']
And then deleting the row by index:
testdf.drop(['Uncoded & errors'], inplace=True)
In [28]: testdf
Out[28]: 
                 counts  percents
Daylight          10541     23.07
Dawn               4143      9.07
Other / unknown     754      1.65
Total             45690    100.00
                        use groupby(..., sort=False).sum():
In [84]: (testdf.reset_index()
   ....:        .replace({'index': {'Uncoded & errors':'Other / unknown'}})
   ....:        .groupby('index', sort=False).sum()
   ....: )
Out[84]:
                 counts  percents
index
Daylight          10541     23.07
Dawn               4143      9.07
Other / unknown     754      1.65
Total             45690    100.00
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With