I just started Python and I block on a simple exercise. I have 3 Dataframes like bellow :
df1 :
A B C
0 1 2 3
1 4 5 6
df2 (empty) :
D E F G H
0
1
dfmap :
m1 m2 m3 m4 m5
0 D F H
1 A B C
I want to write a script which will fill df2 according to the mapping of dfmap. So the output should be
df2 :
D E F G H
0 1 2 3
1 4 5 6
I started this code but i guess i miss all the power of Dataframe (and it doesn't work Array_df2 full of nan) I know it should exist a smartest/simplest way to do that.
listcol_df1 = {}
listcol_df2 = {}
for idx, col in enumerate(df1.columns):
listcol_df1[col] = idx
for idx, col in enumerate(df2.columns):
listcol_df2[col] = idx
Array_df1 = df1.values
Array_df2 = df2.values
Array_dfmap = dfmap.values
for i in range(df1.shape[0]):
for j in range(dfmap.shape[1]):
df2[i][listcol_df2.get(Array_dfmap[0][j])] = Array_df1[i][listcol_df1.get(Array_dfmap[1][j])]
Thanks
You can use the dfmap to rename df1.columns and use that to update ddf2:
df2.update(df1.rename(columns=dfmap.T.set_index(1)[0]))
print(df2)
Output:
D E F G H
0 1 NaN 2.0 NaN 3.0
1 4 NaN 5.0 NaN 6.0
Here's an alteranative just looping through the columns of dfmap, but you may need to add exception handling if dfmap contains column names not in the other DFs:
for col in dfmap:
df2[dfmap[col].loc[0]] = df1[dfmap[col].loc[1]]
To explain, the loop iterates through the column names in dfmap, then this syntax - dfmap[col].loc[X] - just selects the column in dfmap, followed by the row (.loc[0] selects the the value in the first row, .loc[1] selects the value in the second row). Now that I think about it, this could also be written perhaps more simply as dfmap.loc[X, col] where X is the row number in each case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With