Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: sum of values in one dataframe based on the group in a different dataframe

I have a dataframe such contains companies with their sectors

  Symbol             Sector
0    MCM             Industrials
1    AFT             Health Care
2    ABV             Health Care
3    AMN             Health Care
4    ACN  Information Technology

I have another dataframe that contains companies with their positions

  Symbol  Position
0    ABC  1864817
1    AAP -3298989
2    ABV -1556626
3    AXC  2436387
4    ABT   878535 

What I want is to get a dataframe that contains the aggregate positions for sectors. So sum the positions of all the companies in a given sector. I can do this individually by

df2[df2.Symbol.isin(df1.groupby('Sector').get_group('Industrials')['Symbol'].to_list())]  

I am looking for a more efficient pandas approach to do this rather than looping over each sector under the group_by. The final dataframe should look like the following:

     Sector                  Sum Position
0    Industrials             14567232
1    Health Care            -329173249
2    Information Technology -65742234
3    Energy                  6574352342
4    Pharma                  6342387658

Any help is appreciated.

like image 840
Fizi Avatar asked Dec 08 '22 11:12

Fizi


2 Answers

If I understood the question correctly, one way to do it is joining both data frames and then group by sector and sum the position column, like so:

df_agg = df1.join(df2['Position']).drop('Symbol', axis=1)
df_agg.groupby('Sector').sum()

Where, df1 is the df with Sectors and df2 is the df with Positions.

like image 130
RafaJM Avatar answered May 18 '23 18:05

RafaJM


You can map the Symbol column to sector and use that Series to group.

df2.groupby(df2.Symbol.map(df1.set_index('Symbol').Sector)).Position.sum()
like image 42
ALollz Avatar answered May 18 '23 17:05

ALollz