Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: sum two rows of dataframe without rearranging dataframe?

Tags:

python

pandas

I have a dataframe and I'm trying to sum two rows without messing up the order of the rows.

> test = {'counts' : pd.Series([10541,4143,736,18,45690], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total']), 'percents' : pd.Series([23.07,9.07,1.61,0.04,100], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total'])}

> testdf = pd.DataFrame(test)

                  counts  percents
Daylight           10541     23.07
Dawn                4143      9.07
Other / unknown      736      1.61
Uncoded & errors      18      0.04
Total              45690    100.00

I want this output:

                  counts  percents
Daylight           10541     23.07
Dawn                4143      9.07
Other / unknown      754      1.65   <-- sum of 'other/unknown' and 'uncoded & errors'
Total              45690    100.00

This is as close as I've been able to get:

> sum_ = testdf.loc[['Other / unknown', 'Uncoded & errors']].sum().to_frame().transpose()

     counts   percents
0    754.00   1.65       

> sum_ = sum_.rename(index={0: 'Other / unknown'})

                counts   percents
Other / unknown 754.00   1.65   

> testdf.drop(['Other / unknown', 'Uncoded & errors'],inplace=True)
> testdf = testdf.append(sum_)

Daylight         10541  23.07
Dawn             4143   9.07
Total            45690  100
Other / unknown  754    1.65

But this does not preserve the order of the original rows

I could insert the row by slicing the dataframe and inserting the sum_ row between 'Dawn' and 'Total', but that will not work if the row labels ever change, or if the order of the rows change, etc. (this is an annual brochure so the table design might change from year to year), so I'm trying to do this robustly.

like image 282
ale19 Avatar asked Jun 21 '16 14:06

ale19


People also ask

How do I sum multiple rows in pandas DataFrame?

To sum all the rows of a DataFrame, use the sum() function and set the axis value as 1. The value axis 1 will add the row values.

How do you sum specific rows in Python?

To sum only specific rows, use the loc() method. Mention the beginning and end row index using the : operator. Using loc(), you can also set the columns to be included. We can display the result in a new column.

How do you get the sum of rows in pandas?

The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.

How do I add two rows to a DataFrame in Python?

You can add rows to the pandas dataframe using df. iLOC[i] = ['col-1-value', 'col-2-value', ' col-3-value '] statement.


2 Answers

Although I prefer MaxU's answer, you can also try summing in-place:

testdf.loc['Other / unknown'] += testdf.loc['Uncoded & errors']

And then deleting the row by index:

testdf.drop(['Uncoded & errors'], inplace=True)

In [28]: testdf
Out[28]: 
                 counts  percents
Daylight          10541     23.07
Dawn               4143      9.07
Other / unknown     754      1.65
Total             45690    100.00
like image 76
peterfields Avatar answered Nov 05 '22 03:11

peterfields


use groupby(..., sort=False).sum():

In [84]: (testdf.reset_index()
   ....:        .replace({'index': {'Uncoded & errors':'Other / unknown'}})
   ....:        .groupby('index', sort=False).sum()
   ....: )
Out[84]:
                 counts  percents
index
Daylight          10541     23.07
Dawn               4143      9.07
Other / unknown     754      1.65
Total             45690    100.00
like image 28
MaxU - stop WAR against UA Avatar answered Nov 05 '22 03:11

MaxU - stop WAR against UA