Summing across rows of Pandas Dataframe

Tags:

I have a DataFrame of records that looks something like this:

stocks = pd.Series(['A', 'A', 'B', 'C', 'C'], name = 'stock')
positions = pd.Series([ 100, 200, 300, 400, 500], name = 'positions')
same1 = pd.Series(['AA', 'AA', 'BB', 'CC', 'CC'], name = 'same1')
same2 = pd.Series(['AAA', 'AAA', 'BBB', 'CCC', 'CCC'], name = 'same2')
diff = pd.Series(['A1', 'A2', 'B3' ,'C1', 'C2'], name = 'different')
df = pd.DataFrame([stocks, same1, positions, same2, diff]).T
df

This gives a pandas DataFrame that looks like

      stock same1 positions same2 different
0     A    AA       100   AAA        A1
1     A    AA       200   AAA        A2
2     B    BB       300   BBB        B3
3     C    CC       400   CCC        C1
4     C    CC       500   CCC        C2

I'm not interested in the data in 'different' columns and want to sum the positions along the unique other columns. I am currently doing it by:

df.groupby(['stock','same1','same2'])['positions'].sum()

which gives:

stock  same1  same2
A      AA     AAA      300
B      BB     BBB      300
C      CC     CCC      900
Name: positions

Problem is that this is a pd.Series (with Multi-Index). Currently I iterate over it to build a DataFrame again. I am sure that I am missing a method. Basically I want to drop 1 column from a DataFrame and then "rebuild it" so that one column is summed and the rest of the fields (which are the same) stay in place.

This groupby method breaks if there are empty positions. So I currently use an elaborate iteration over the DataFrame to build a new one. Is there a better approach?

553

asked Jun 18 '13 10:06

Joop

1 Answers

Step 1. Use [['positions']] instead of ['positions']:

In [30]: df2 = df.groupby(['stock','same1','same2'])[['positions']].sum()

In [31]: df2 
Out[31]: 

                   positions
stock same1 same2               
A     AA    AAA          300 
B     BB    BBB          300 
C     CC    CCC          900

Step 2. And then use reset_index to move the index back to the column

In [34]: df2.reset_index()
Out[34]: 
  stock same1 same2  positions
0     A    AA   AAA        300 
1     B    BB   BBB        300 
2     C    CC   CCC        900

EDIT

Seems my method is not so good.

Thanks to @Andy and @unutbu , you can achieve your goal by more elegant ways:

method 1:

df.groupby(['stock', 'same1', 'same2'])['positions'].sum().reset_index()

method 2:

df.groupby(['stock', 'same1', 'same2'], as_index=False)['positions'].sum()

137

answered Sep 20 '22 05:09

waitingkuo

Related questions
                            
                                Pandas rolling apply with missing data
                            
                                CSV export in Stream (from Django admin on Heroku)
                            
                                Pythonic way to import data from multiple files into an array
                            
                                Python - Understanding the send function of a generator
                            
                                Are there any distributed machine learning libraries for using Python with Hadoop? [closed]
                            
                                Most efficient way to calculate pairwise similarity of 250k lists
                            
                                Writing a Python compiler for practice [closed]
                            
                                Is it possible to specify the previous directory python?
                            
                                Python json.loads doesn't work
                            
                                Iteration over variable names in python?
                            
                                Is there an easy way generate a probable list of words from an unspaced sentence in python?
                            
                                Upper/lower limits with matplotlib
                            
                                Is there a standard way to store XY data in Python?
                            
                                Reading Multiple CSV Files into Python Pandas Dataframe
                            
                                Get progress from async python celery chain by chain id
                            
                                How to store application settings across modules [duplicate]
                            
                                Efficient extraction of a subgraph according to some edge attribute in NetworkX
                            
                                flask jinja2 href not linking correctly
                            
                                Imported modules become None when running a function
                            
                                Periodogram in Octave/Matlab vs Scipy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Summing across rows of Pandas Dataframe

Tags:

python

pandas

dataframe

Joop

People also ask

1 Answers

EDIT

waitingkuo

Recent Activity

Donate For Us