I have a dataframe like:
df = pd.DataFrame([['A', 3, 'fox'], ['A', 3, 'cat'], ['A', 3, 'dog'],
['B', 2, 'rabbit'], ['B', 2, 'dog'], ['B', 2, 'eel'],
['C', 6, 'fox'], ['C', 6, 'elephant']],
columns=['group', 'val', 'animal'])
df
Output:
group val animal
0 A 3 fox
1 A 3 cat
2 A 3 dog
3 B 2 rabbit
4 B 2 dog
5 B 2 eel
6 C 6 fox
7 C 6 elephant
For a given group, the val
is always the same (so always 3 for A
, 2 for B
, 6 for C
).
How can I generate a dataframe with all combinations of group
and animal
elements? Also val
should be carried over and there should be a column indicating whether this row was present in the original data or added in the permutations.
Desired result:
df = pd.DataFrame([['A', 3, 'fox', 1], ['A', 3, 'cat', 1], ['A', 3, 'dog', 1], ['A', 3, 'rabbit', 0], ['A', 3, 'eel', 0], ['A', 3, 'elephant', 0],
['B', 2, 'rabbit', 1], ['B', 2, 'dog', 1], ['B', 2, 'eel', 1], ['B', 2, 'fox', 0], ['B', 2, 'cat', 0], ['B', 2, 'elephant', 0],
['C', 6, 'fox', 1], ['C', 6, 'elephant', 1], ['C', 6, 'cat', 0], ['C', 6, 'dog', 0], ['C', 6, 'rabbit', 0], ['C', 6, 'eel', 0]],
columns=['group', 'val', 'animal', 'occurred'])
df
Output:
group val animal occurred
0 A 3 fox 1
1 A 3 cat 1
2 A 3 dog 1
3 A 3 rabbit 0
4 A 3 eel 0
5 A 3 elephant 0
6 B 2 rabbit 1
7 B 2 dog 1
8 B 2 eel 1
9 B 2 fox 0
10 B 2 cat 0
11 B 2 elephant 0
12 C 6 fox 1
13 C 6 elephant 1
14 C 6 cat 0
15 C 6 dog 0
16 C 6 rabbit 0
17 C 6 eel 0
How can I do this?
Edit: there have been a few answers which work. I'll give 'best' to whichever can be made to handle the possibility of multiple columns joined to group
(e.g. not just 'val'
but ['val1','val2']
) in an elegant way.
One solution would be to use reindex
with MultiIndex
:
mux = pd.MultiIndex.from_product([df['group'].unique(), df['animal'].unique()], names=('group','animal'))
df = df.set_index(['group','animal']).reindex(mux).reset_index()
df['occurred'] = df['val'].notnull().astype(int)
df['val'] = df.groupby('group')['val'].transform('first')
Result:
group animal val occurred
0 A fox 3.0 1
1 A cat 3.0 1
2 A dog 3.0 1
3 A rabbit 3.0 0
4 A eel 3.0 0
5 A elephant 3.0 0
6 B fox 2.0 0
7 B cat 2.0 0
8 B dog 2.0 1
9 B rabbit 2.0 1
10 B eel 2.0 1
11 B elephant 2.0 0
12 C fox 6.0 1
13 C cat 6.0 0
14 C dog 6.0 0
15 C rabbit 6.0 0
16 C eel 6.0 0
17 C elephant 6.0 1
Extension:
To handle multiple val columns, use a list of column names instead of only 'val'
. Only the last two rows are changed slightly.
val_cols = ['val1', 'val2']
mux = pd.MultiIndex.from_product([df['group'].unique(), df['animal'].unique()], names=('group','animal'))
df = df.set_index(['group','animal']).reindex(mux).reset_index()
df['occurred'] = df[val_cols[0]].notnull().astype(int)
df[val_cols ] = df.groupby('group')[val_cols].transform('first')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With