Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate all combinations in pandas

Tags:

python

pandas

I have a dataframe like:

df = pd.DataFrame([['A', 3, 'fox'], ['A', 3, 'cat'], ['A', 3, 'dog'],
                   ['B', 2, 'rabbit'], ['B', 2, 'dog'], ['B', 2, 'eel'],
                   ['C', 6, 'fox'], ['C', 6, 'elephant']],
                  columns=['group', 'val', 'animal'])
df

Output:

    group   val animal
0   A       3   fox
1   A       3   cat
2   A       3   dog
3   B       2   rabbit
4   B       2   dog
5   B       2   eel
6   C       6   fox
7   C       6   elephant

For a given group, the val is always the same (so always 3 for A, 2 for B, 6 for C).

How can I generate a dataframe with all combinations of group and animal elements? Also val should be carried over and there should be a column indicating whether this row was present in the original data or added in the permutations.

Desired result:

df = pd.DataFrame([['A', 3, 'fox', 1], ['A', 3, 'cat', 1], ['A', 3, 'dog', 1], ['A', 3, 'rabbit', 0], ['A', 3, 'eel', 0], ['A', 3, 'elephant', 0],
                   ['B', 2, 'rabbit', 1], ['B', 2, 'dog', 1], ['B', 2, 'eel', 1], ['B', 2, 'fox', 0], ['B', 2, 'cat', 0], ['B', 2, 'elephant', 0],
                   ['C', 6, 'fox', 1], ['C', 6, 'elephant', 1], ['C', 6, 'cat', 0], ['C', 6, 'dog', 0], ['C', 6, 'rabbit', 0], ['C', 6, 'eel', 0]], 
                  columns=['group', 'val', 'animal', 'occurred'])
df

Output:

    group   val animal  occurred
0   A       3   fox         1
1   A       3   cat         1
2   A       3   dog         1
3   A       3   rabbit      0
4   A       3   eel         0
5   A       3   elephant    0
6   B       2   rabbit      1
7   B       2   dog         1
8   B       2   eel         1
9   B       2   fox         0
10  B       2   cat         0
11  B       2   elephant    0
12  C       6   fox         1
13  C       6   elephant    1
14  C       6   cat         0
15  C       6   dog         0
16  C       6   rabbit      0
17  C       6   eel         0

How can I do this?

Edit: there have been a few answers which work. I'll give 'best' to whichever can be made to handle the possibility of multiple columns joined to group (e.g. not just 'val' but ['val1','val2']) in an elegant way.

like image 648
Mobeus Zoom Avatar asked Mar 02 '23 03:03

Mobeus Zoom


1 Answers

One solution would be to use reindex with MultiIndex:

mux = pd.MultiIndex.from_product([df['group'].unique(), df['animal'].unique()], names=('group','animal'))
df = df.set_index(['group','animal']).reindex(mux).reset_index()
df['occurred'] = df['val'].notnull().astype(int)
df['val'] = df.groupby('group')['val'].transform('first')

Result:

   group    animal  val  occurred
0      A       fox  3.0         1
1      A       cat  3.0         1
2      A       dog  3.0         1
3      A    rabbit  3.0         0
4      A       eel  3.0         0
5      A  elephant  3.0         0
6      B       fox  2.0         0
7      B       cat  2.0         0
8      B       dog  2.0         1
9      B    rabbit  2.0         1
10     B       eel  2.0         1
11     B  elephant  2.0         0
12     C       fox  6.0         1
13     C       cat  6.0         0
14     C       dog  6.0         0
15     C    rabbit  6.0         0
16     C       eel  6.0         0
17     C  elephant  6.0         1

Extension:

To handle multiple val columns, use a list of column names instead of only 'val'. Only the last two rows are changed slightly.

val_cols = ['val1', 'val2']

mux = pd.MultiIndex.from_product([df['group'].unique(), df['animal'].unique()], names=('group','animal'))
df = df.set_index(['group','animal']).reindex(mux).reset_index()
df['occurred'] = df[val_cols[0]].notnull().astype(int)
df[val_cols ] = df.groupby('group')[val_cols].transform('first')
like image 90
Shaido Avatar answered Mar 05 '23 18:03

Shaido