Summing up multiple values in single row

Question

Given a dataframe such as this, is it possible to add up the countries specific value even if there are multiple countries in one row? For example, for the 1st row Japan and USA are present, so i would want the value to be Japan=1 USA=1

import pandas as pd
import numpy as np

countries=["Europe","USA","Japan"]
data= {'Employees':[1,2,3,4],
    'Country':['Japan;USA','USA;Europe',"Japan","Europe;Japan"]}
df=pd.DataFrame(data)
print(df)

patt = '(' + '|'.join(countries) + ')'
grp = df.Country.str.extractall(pat=patt).values
new_df = df.groupby(grp).agg({'Employees': sum})
print(new_df)

I have tried this but it returns a grouper and axis must be same length error. Is this the correct way to do it?

ValueError                                Traceback (most recent call last)
<ipython-input-81-53e8e9f0f301> in <module>()
     10 patt = '(' + '|'.join(countries) + ')'
     11 grp = df.Country.str.extractall(pat=patt).values
---> 12 new_df = df.groupby(grp).agg({'Employees': sum})
     13 print(new_df)

    4 frames
    /usr/local/lib/python3.7/dist-packages/pandas/core/groupby/grouper.py in _convert_grouper(axis, grouper)
        842     elif isinstance(grouper, (list, Series, Index, np.ndarray)):
        843         if len(grouper) != len(axis):
    --> 844             raise ValueError("Grouper and axis must be same length")
        845         return grouper
        846     else:

Thus, i would like the end result to be Japan: 8 Europe:6 USA:3

Thanks

RavinderSingh13 · Accepted Answer

Could you please try following, written and tested with shown samples. Using split, explode, groupby functions of Pandas.

df['Country'] = df['Country'].str.split(';')
df.explode('Country').groupby('Country')['Employees'].sum()

Output will be as follows:

Country
Eurpoe  6
Japan   8
USA     3
Name: Employees, dtype: int64

Explanation: Simple explanation would be:

Firstly splitting Country column of DataFrame by ; and saving results into same column.
Then using explode on Country column then using groupby on Country column and using sum function on it to get its sum in Employees column.

Summing up multiple values in single row

Tags:

python

pandas

dataframe

pandas-groupby

brezz

1 Answers

RavinderSingh13

Recent Activity

Donate For Us

Summing up multiple values in single row

Tags:

python

pandas

dataframe

pandas-groupby

brezz

1 Answers

RavinderSingh13

Related questions

Recent Activity

Donate For Us