I have two dataFrames :
df1
mag cat
0 101 A1
1 256 A2
2 760 A2
3 888 A3
...
df2
A1 A2 A3 ...
0 E50R AZ33 REZ3
1 T605 YYU6 YHG5
2 IR50 P0O9 BF53
3 NaN YY9I NaN
And I would like to create a final DataFrame which looks like :
df
101 256 760 888 ...
0 E50R AZ33 AZ33 REZ3
1 T605 YYU6 YYU6 YHG5
2 IR50 P0O9 P0O9 BF53
3 NaN YY9I YY9I NaN
I tried something with pivot, but it doesn't seem to do the job Could you help me ?
IIUC reindex
+re-name
newdf=df2.reindex(columns=df1.cat)
newdf.columns=df1.mag
newdf
Out[519]:
mag 101 256 760 888
0 E50R AZ33 AZ33 REZ3
1 T605 YYU6 YYU6 YHG5
2 IR50 P0O9 P0O9 BF53
3 NaN YY9I YY9I NaN
You can use a combination of GroupBy
, numpy.repeat
, itertools.chain
:
from itertools import chain
# map cat to list of mag
s = df1.groupby('cat')['mag'].apply(list)
# calculate indices for columns, including repeats
cols_idx = np.repeat(range(len(df2.columns)), s.map(len))
# apply indexing
res = df2.iloc[:, cols_idx]
# rename columns
res.columns = list(chain.from_iterable(df2.columns.map(s.get)))
print(res)
101 256 760 888
0 E50R AZ33 AZ33 REZ3
1 T605 YYU6 YYU6 YHG5
2 IR50 P0O9 P0O9 BF53
3 NaN YY9I YY9I NaN
Performance benchmarking
Some good and different solutions here, so you may be interested in performance. Wen's reindex
solution is the clear winner.
%timeit wen(df1, df2) # 632 µs per loop
%timeit jpp(df1, df2) # 2.55 ms per loop
%timeit scb(df1, df2) # 7.98 ms per loop
%timeit abhi(df1, df2) # 4.52 ms per loop
Code:
def jpp(df1, df2):
s = df1.groupby('cat')['mag'].apply(list)
cols_idx = np.repeat(range(len(df2.columns)), s.map(len))
res = df2.iloc[:, cols_idx]
res.columns = list(chain.from_iterable(df2.columns.map(s.get)))
return res
def scb(df1, df2):
df_out = (df2.stack().reset_index()
.merge(df1, left_on='level_1', right_on='cat')[['level_0','mag',0]])
return df_out.pivot('level_0','mag',0).reset_index(drop=True)
def abhi(df1, df2):
return df2.T.merge(df1, left_index=True, right_on='cat').drop('cat', axis=1).set_index('mag').T
def wen(df1, df2):
newdf=df2.reindex(columns=df1.cat)
newdf.columns=df1.mag
return newdf
Another way you can do it using stack
, merge
, and pivot
:
df_out = (df2.stack().reset_index()
.merge(df1, left_on='level_1', right_on='cat')[['level_0','mag',0]])
df_out.pivot('level_0','mag',0).reset_index(drop=True)
Output:
mag 101 256 760 888
0 E50R AZ33 AZ33 REZ3
1 T605 YYU6 YYU6 YHG5
2 IR50 P0O9 P0O9 BF53
3 NaN YY9I YY9I NaN
You can do this by transpose and merge and then set column 'mag' as index. Then again transposing it.
df2_transposed = df2.T
res = df2_transposed.merge(df1,how = "left",left_index=True,right_on='cat')
del res['cat']
res.set_index('mag', inplace=True)
res.T
mag 101 256 760 888
0 E50R AZ33 AZ33 REZ3
1 T605 YYU6 YYU6 YHG5
2 IR50 P0O9 P0O9 BF53
3 YY9I YY9I
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With