I have a dataframe and I'm trying to sum two rows without messing up the order of the rows. <pre class="prettyprint"><code>> test = {'counts' : pd.Series([10541,4143,736,18,45690], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total']), 'percents' : pd.Series([23.07,9.07,1.61,0.04,100], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total'])} > testdf = pd.DataFrame(test) counts percents Daylight 10541 23.07 Dawn 4143 9.07 Other / unknown 736 1.61 Uncoded & errors 18 0.04 Total 45690 100.00 </code></pre> I want this output: <pre class="prettyprint"><code> counts percents Daylight 10541 23.07 Dawn 4143 9.07 Other / unknown 754 1.65 <-- sum of 'other/unknown' and 'uncoded & errors' Total 45690 100.00 </code></pre> This is as close as I've been able to get: <pre class="prettyprint"><code>> sum_ = testdf.loc[['Other / unknown', 'Uncoded & errors']].sum().to_frame().transpose() counts percents 0 754.00 1.65 > sum_ = sum_.rename(index={0: 'Other / unknown'}) counts percents Other / unknown 754.00 1.65 > testdf.drop(['Other / unknown', 'Uncoded & errors'],inplace=True) > testdf = testdf.append(sum_) Daylight 10541 23.07 Dawn 4143 9.07 Total 45690 100 Other / unknown 754 1.65 </code></pre> But this does not preserve the order of the original rows I could insert the row by slicing the dataframe and inserting the sum_ row between 'Dawn' and 'Total', but that will not work if the row labels ever change, or if the order of the rows change, etc. (this is an annual brochure so the table design might change from year to year), so I'm trying to do this robustly.

Although I prefer MaxU's answer, you can also try summing in-place: <pre class="prettyprint"><code>testdf.loc['Other / unknown'] += testdf.loc['Uncoded & errors'] </code></pre> And then deleting the row by index: <pre class="prettyprint"><code>testdf.drop(['Uncoded & errors'], inplace=True) In [28]: testdf Out[28]: counts percents Daylight 10541 23.07 Dawn 4143 9.07 Other / unknown 754 1.65 Total 45690 100.00 </code></pre>

use <code>groupby(..., sort=False).sum()</code>: <pre class="prettyprint"><code>In [84]: (testdf.reset_index() ....: .replace({'index': {'Uncoded & errors':'Other / unknown'}}) ....: .groupby('index', sort=False).sum() ....: ) Out[84]: counts percents index Daylight 10541 23.07 Dawn 4143 9.07 Other / unknown 754 1.65 Total 45690 100.00 </code></pre>

pandas: sum two rows of dataframe without rearranging dataframe?

Tags:

python

pandas

I have a dataframe and I'm trying to sum two rows without messing up the order of the rows.

> test = {'counts' : pd.Series([10541,4143,736,18,45690], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total']), 'percents' : pd.Series([23.07,9.07,1.61,0.04,100], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total'])}

> testdf = pd.DataFrame(test)

                  counts  percents
Daylight           10541     23.07
Dawn                4143      9.07
Other / unknown      736      1.61
Uncoded & errors      18      0.04
Total              45690    100.00

I want this output:

                  counts  percents
Daylight           10541     23.07
Dawn                4143      9.07
Other / unknown      754      1.65   <-- sum of 'other/unknown' and 'uncoded & errors'
Total              45690    100.00

This is as close as I've been able to get:

> sum_ = testdf.loc[['Other / unknown', 'Uncoded & errors']].sum().to_frame().transpose()

     counts   percents
0    754.00   1.65       

> sum_ = sum_.rename(index={0: 'Other / unknown'})

                counts   percents
Other / unknown 754.00   1.65   

> testdf.drop(['Other / unknown', 'Uncoded & errors'],inplace=True)
> testdf = testdf.append(sum_)

Daylight         10541  23.07
Dawn             4143   9.07
Total            45690  100
Other / unknown  754    1.65

But this does not preserve the order of the original rows

I could insert the row by slicing the dataframe and inserting the sum_ row between 'Dawn' and 'Total', but that will not work if the row labels ever change, or if the order of the rows change, etc. (this is an annual brochure so the table design might change from year to year), so I'm trying to do this robustly.

282

asked Jun 21 '16 14:06

ale19

2 Answers

Although I prefer MaxU's answer, you can also try summing in-place:

testdf.loc['Other / unknown'] += testdf.loc['Uncoded & errors']

And then deleting the row by index:

testdf.drop(['Uncoded & errors'], inplace=True)

In [28]: testdf
Out[28]: 
                 counts  percents
Daylight          10541     23.07
Dawn               4143      9.07
Other / unknown     754      1.65
Total             45690    100.00

answered Nov 05 '22 03:11

peterfields

use groupby(..., sort=False).sum():

In [84]: (testdf.reset_index()
   ....:        .replace({'index': {'Uncoded & errors':'Other / unknown'}})
   ....:        .groupby('index', sort=False).sum()
   ....: )
Out[84]:
                 counts  percents
index
Daylight          10541     23.07
Dawn               4143      9.07
Other / unknown     754      1.65
Total             45690    100.00

answered Nov 05 '22 03:11

MaxU - stop WAR against UA

Related questions
                            
                                Pandas: Convert lists within a single column to multiple columns
                            
                                How i can disable alembic logging at runtime?
                            
                                High-dimensional data structure in Python
                            
                                How to sort a list of strings with a different order?
                            
                                Stanford CoreNLP OpenIE annotator
                            
                                Pandas filter columns of a DataFrame with bool
                            
                                touch a directory in python (Linux) [duplicate]
                            
                                how to activate the ananconda's env python in emacs?
                            
                                How to make matplotlib/pandas bar chart look like hist chart?
                            
                                python Reportlab two items in the same row on a Paragraph
                            
                                Python equivalent for do.call(rbind, lapply()) from R
                            
                                Conform long lines to fit PEP 8
                            
                                Creating only one random prime number in provided range
                            
                                Django Model Formset: only track changes to those items that have been updated/saved in the set?
                            
                                Install numpy for python 2.7 and not 3.4
                            
                                Why Does this DataFrame Modification within Function Change Global Outside Function?
                            
                                pip uninstall working but giving error
                            
                                Compare string to bytes that works in both Python 2 and 3
                            
                                "Message: unknown error: cannot focus element" in python selenium driver
                            
                                Find route_table_id by subnet_id using boto3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With