Given a dataframe such as this, is it possible to add up the countries specific value even if there are multiple countries in one row? For example, for the 1st row Japan and USA are present, so i would want the value to be Japan=1 USA=1
import pandas as pd
import numpy as np
countries=["Europe","USA","Japan"]
data= {'Employees':[1,2,3,4],
'Country':['Japan;USA','USA;Europe',"Japan","Europe;Japan"]}
df=pd.DataFrame(data)
print(df)
patt = '(' + '|'.join(countries) + ')'
grp = df.Country.str.extractall(pat=patt).values
new_df = df.groupby(grp).agg({'Employees': sum})
print(new_df)
I have tried this but it returns a grouper and axis must be same length error. Is this the correct way to do it?
ValueError Traceback (most recent call last)
<ipython-input-81-53e8e9f0f301> in <module>()
10 patt = '(' + '|'.join(countries) + ')'
11 grp = df.Country.str.extractall(pat=patt).values
---> 12 new_df = df.groupby(grp).agg({'Employees': sum})
13 print(new_df)
4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/groupby/grouper.py in _convert_grouper(axis, grouper)
842 elif isinstance(grouper, (list, Series, Index, np.ndarray)):
843 if len(grouper) != len(axis):
--> 844 raise ValueError("Grouper and axis must be same length")
845 return grouper
846 else:
Thus, i would like the end result to be Japan: 8 Europe:6 USA:3
Thanks
Could you please try following, written and tested with shown samples. Using split, explode, groupby functions of Pandas.
df['Country'] = df['Country'].str.split(';')
df.explode('Country').groupby('Country')['Employees'].sum()
Output will be as follows:
Country
Eurpoe 6
Japan 8
USA 3
Name: Employees, dtype: int64
Explanation: Simple explanation would be:
; and saving results into same column.explode on Country column then using groupby on Country column and using sum function on it to get its sum in Employees column.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With