I'd like to convert to below Df1 to Df2.
The empty values would be filled with Nan.
Below Dfs are examples.
IDs are 100,000.
Only week 8 has all IDs, so total rows will be 100,000.
I have Df3 which has 100,000 of id, and I want to merge df1 on Df3 formatted as df2.
ex) pd.merge(df3, df1, on="id", how="left") -> but, formatted as df2
Df1>
wk, id, col1, col2 ...
1 1 0.5 15
2 2 0.5 15
3 3 0.5 15
1 2 0.5 15
3 2 0.5 15
------
Df2>
wk1, id, col1, col2, wk2, id, col1, col2, wk3, id, col1, col2,...
1 1 0.5 15 2 1 Nan Nan 3 1 Nan Nan
1 2 0.5 15 2 2 0.5 15 3 2 0.5 15
1 3 Nan Nan 2 3 Nan Nan 3 3 0.5 15
Use:
#create dictionary for rename columns for correct sorting
d = dict(enumerate(df.columns))
d1 = {v:k for k, v in d.items()}
#first add missing values for each `wk` and `id`
df1 = df.set_index(['wk', 'id']).unstack().stack(dropna=False).reset_index()
#for each id create DataFrame, reshape by unstask and rename columns
df1 = (df1.groupby('id')
.apply(lambda x: pd.DataFrame(x.values, columns=df.columns))
.unstack()
.reset_index(drop=True)
.rename(columns=d1, level=0)
.sort_index(axis=1, level=1)
.rename(columns=d, level=0))
#convert values to integers if necessary
df1.loc[:, ['wk', 'id']] = df1.loc[:, ['wk', 'id']].astype(int)
#flatten MultiIndex in columns
df1.columns = ['{}_{}'.format(a, b) for a, b in df1.columns]
print (df1)
wk_0 id_0 col1_0 col2_0 wk_1 id_1 col1_1 col2_1 wk_2 id_2 col1_2 \
0 1 1 0.5 15.0 2 1 NaN NaN 3 1 NaN
1 1 2 0.5 15.0 2 2 0.5 15.0 3 2 0.5
2 1 3 NaN NaN 2 3 NaN NaN 3 3 0.5
col2_2
0 NaN
1 15.0
2 15.0
You can use GroupBy + concat. The idea is to create a list of dataframes with appropriately named columns and appropriate index. The concatenate along axis=1:
d = {k: v.reset_index(drop=True) for k, v in df.groupby('wk')}
def formatter(df, key):
return df.rename(columns={'w': f'wk{key}'}).set_index('id')
L = [formatter(df, key) for key, df in d.items()]
res = pd.concat(L, axis=1).reset_index()
print(res)
id wk col1 col2 wk col1 col2 wk col1 col2
0 1 1.0 0.5 15.0 NaN NaN NaN NaN NaN NaN
1 2 1.0 0.5 15.0 2.0 0.5 15.0 3.0 0.5 15.0
2 3 NaN NaN NaN NaN NaN NaN 3.0 0.5 15.0
Note NaN forces your series to become float. There's no "good" fix for this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With