I have a list such as
groups = [['Group1', 'A', 'B'], ['Group2', 'C', 'D']]
and a dataframe such as
A 100
B 200
C 300
D 400
I want to make a group sum from the list above to become:
Group 1 300
Group 2 700
How can I do this using python pandas? Needless to say I am a newbie in pandas. Thanks.
A DataFrame may be grouped by a combination of columns and index levels by specifying the column names as strings and the index levels as pd. Grouper objects. The following example groups df by the second index level and the A column.
pandas GroupBy Multiple Columns Example Most of the time when you are working on a real-time project in pandas DataFrame you are required to do groupby on multiple columns. You can do so by passing a list of column names to DataFrame. groupby() function.
Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe. groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes.
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
You need create dict
by lists
and then groupby
and aggregating sum
:
df = pd.DataFrame({'a': ['A', 'B', 'C', 'D'], 'b': [100, 200, 300, 400]})
print (df)
a b
0 A 100
1 B 200
2 C 300
3 D 400
groups = [['Group1', 'A', 'B'], ['Group2', 'C', 'D']]
#http://stackoverflow.com/q/43227103/2901002
d = {k:row[0] for row in groups for k in row[1:]}
print (d)
{'B': 'Group1', 'C': 'Group2', 'D': 'Group2', 'A': 'Group1'}
print (df.set_index('a').groupby(d).sum())
b
Group1 300
Group2 700
Is possible a bit modify solution - if where only column b
is aggregate by sum
. Last reset_index
for convert index to column.
df1 = df.set_index('a').groupby(pd.Series(d, name='a'))['b'].sum().reset_index()
print (df1)
a b
0 Group1 300
1 Group2 700
df2 = df.groupby(df['a'].map(d))['b'].sum().reset_index()
print (df2)
a b
0 Group1 300
1 Group2 700
Another option...but seems @jezrael's way is better!
import pandas as pd
groups = [['Group1', 'A', 'B'], ['Group2', 'C', 'D']]
df0 = pd.melt(pd.DataFrame(groups).set_index(0).T)
df1 = pd.read_clipboard(header=None) # Your example data
df = df1.merge(df0, left_on=0, right_on='value')[['0_y', 1]]
df.columns = ['Group', 'Value']
print df.groupby('Group').sum()
Value
Group
Group1 300
Group2 700
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With