Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a DataFrame of combinations for each group with pandas

Tags:

python

pandas

The inputs are as follows.

Out[178]: 
  group value
0     A     a
1     A     b
2     A     c
3     A     d
4     B     c
5     C     d
6     C     e
7     C     a

For this input, I want to create a combination for each group and create one DataFrame. How can I do it?

The output I want to get:

Out[180]: 
  group  0  1
0     A  a  b
1     A  a  c
2     A  a  d
3     A  b  c
4     A  b  d
5     A  c  d
0     C  d  e
1     C  d  a
2     C  e  a
like image 658
Keiku Avatar asked Mar 14 '18 06:03

Keiku


3 Answers

Using combinations in a comprehension

from itertools import combinations

pd.DataFrame([
    [n, x, y]
    for n, g in df.groupby('group').value
    for x, y in combinations(g, 2)
], columns=['group', 0, 1])

  group  0  1
0     A  a  b
1     A  a  c
2     A  a  d
3     A  b  c
4     A  b  d
5     A  c  d
6     C  d  e
7     C  d  a
8     C  e  a
like image 69
piRSquared Avatar answered Nov 01 '22 19:11

piRSquared


Use groupby with lambda function and combinations:

from  itertools import combinations

df = (df.groupby('group')['value'].apply(lambda x: pd.DataFrame(list(combinations(x,2))))
        .reset_index(level=1, drop=True)
        .reset_index())
print (df)
  group  0  1
0     A  a  b
1     A  a  c
2     A  a  d
3     A  b  c
4     A  b  d
5     A  c  d
6     C  d  e
7     C  d  a
8     C  e  a
like image 31
jezrael Avatar answered Nov 01 '22 18:11

jezrael


This can be achieved with itertools and a list comprehension:

from itertools import combinations, chain

gen = ([(g,)+i for i in list(combinations(df.loc[df['group'] == g, 'value'], 2))] \
               for g in df['group'].unique())

df_out = pd.DataFrame(list(chain.from_iterable(gen)), columns=['group', 0, 1])

Result

  group  0  1
0     A  a  b
1     A  a  c
2     A  a  d
3     A  b  c
4     A  b  d
5     A  c  d
6     C  d  e
7     C  d  a
8     C  e  a
like image 39
jpp Avatar answered Nov 01 '22 18:11

jpp