Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Making a group in dataframe in pandas

I have a list such as

groups = [['Group1', 'A', 'B'], ['Group2', 'C', 'D']]

and a dataframe such as

A 100
B 200
C 300
D 400

I want to make a group sum from the list above to become:

Group 1 300
Group 2 700

How can I do this using python pandas? Needless to say I am a newbie in pandas. Thanks.

like image 440
Caglar Avatar asked Apr 05 '17 09:04

Caglar


People also ask

How do you create a DataFrame group?

A DataFrame may be grouped by a combination of columns and index levels by specifying the column names as strings and the index levels as pd. Grouper objects. The following example groups df by the second index level and the A column.

How do I Group A pandas DataFrame by multiple columns?

pandas GroupBy Multiple Columns Example Most of the time when you are working on a real-time project in pandas DataFrame you are required to do groupby on multiple columns. You can do so by passing a list of column names to DataFrame. groupby() function.

What is group by () in pandas library?

Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe. groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes.

How do I group specific rows in pandas?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.


2 Answers

You need create dict by lists and then groupby and aggregating sum:

df = pd.DataFrame({'a': ['A', 'B', 'C', 'D'], 'b': [100, 200, 300, 400]})
print (df)
   a    b
0  A  100
1  B  200
2  C  300
3  D  400

groups = [['Group1', 'A', 'B'], ['Group2', 'C', 'D']]

#http://stackoverflow.com/q/43227103/2901002
d = {k:row[0] for row in groups for k in row[1:]}
print (d)
{'B': 'Group1', 'C': 'Group2', 'D': 'Group2', 'A': 'Group1'}

print (df.set_index('a').groupby(d).sum())
          b
Group1  300
Group2  700

Is possible a bit modify solution - if where only column b is aggregate by sum. Last reset_index for convert index to column.

df1 = df.set_index('a').groupby(pd.Series(d, name='a'))['b'].sum().reset_index()
print (df1)
        a    b
0  Group1  300
1  Group2  700

df2 = df.groupby(df['a'].map(d))['b'].sum().reset_index()
print (df2)
        a    b
0  Group1  300
1  Group2  700
like image 69
jezrael Avatar answered Nov 01 '22 17:11

jezrael


Another option...but seems @jezrael's way is better!

import pandas as pd

groups = [['Group1', 'A', 'B'], ['Group2', 'C', 'D']]

df0 = pd.melt(pd.DataFrame(groups).set_index(0).T)
df1 = pd.read_clipboard(header=None)  # Your example data

df = df1.merge(df0, left_on=0, right_on='value')[['0_y', 1]]
df.columns = ['Group', 'Value']

print df.groupby('Group').sum()


        Value
Group        
Group1    300
Group2    700
like image 42
su79eu7k Avatar answered Nov 01 '22 18:11

su79eu7k