I wonder whether there is the fastest code to replace the two for loops, assuming the df size is very large. In my real case, each dataframe is 200 rows and 25 columns.
data_df1 = np.array([['Name','Unit','Attribute','Date'],['a','A',1,2014],['b','B',2,2015],['c','C',3,2016],\
['d','D',4,2017],['e','E',5,2018]])
data_df2 = np.array([['Name','Unit','Date'],['a','F',2019],['b','G',2020],['e','H',2021],\
['f','I',2022]])
df1 = pd.DataFrame(data=data_df1)
print('df1:')
print(df1)
df2 = pd.DataFrame(data=data_df2)
print('df2:')
print(df2)
row_df1 = [1,2,5]
col_df1 = [1,3]
row_df2 = [1,2,3]
col_df2 = [1,2]
for i in range(0,len(row_df1)):
for j in range(0, len(col_df1)):
df1.set_value(row_df1[i],col_df1[j], df2.loc[row_df2[i],col_df2[j]])
print('df1 after operation:')
print(df1)
Expected output:
df1:
0 1 2 3
0 Name Unit Attribute Date
1 a A 1 2014
2 b B 2 2015
3 c C 3 2016
4 d D 4 2017
5 e E 5 2018
df2:
0 1 2
0 Name Unit Date
1 a F 2019
2 b G 2020
3 e H 2021
4 f I 2022
df1 after operation:
0 1 2 3
0 Name Unit Attribute Date
1 a F 1 2019
2 b G 2 2020
3 c C 3 2016
4 d D 4 2017
5 e H 5 2021
I have tried:
df1.loc[[1,2,5],[1,3]] = df2.loc[[1,2,3],[1,2]]
print('df1:')
print(df1)
print('df2:')
print(df2)
but the outcome is the following. There are unexpected Nan.
df1:
0 1 2 3
0 Name Unit Attribute Date
1 a F 1 NaN
2 b G 2 NaN
3 c C 3 2016
4 d D 4 2017
5 e NaN 5 NaN
df2:
0 1 2
0 Name Unit Date
1 a F 2019
2 b G 2020
3 e H 2021
4 f I 2022
Thanks in advance for whoever helps.
Some cleaning:
def clean_df(df):
df.columns = df.iloc[0]
df.columns.name = None
df = df.iloc[1:].reset_index()
return df
df1 = clean_df(df1)
df1
index Name Unit Attribute Date
0 1 a A 1 2014
1 2 b B 2 2015
2 3 c C 3 2016
3 4 d D 4 2017
4 5 e E 5 2018
df2 = clean_df(df2)
df2
index Name Unit Date
0 1 a F 2019
1 2 b G 2020
2 3 e H 2021
3 4 f I 2022
Use merge
, specifying on=Name
, so the other columns are not considered.
cols = ['Name', 'Unit_y', 'Attribute', 'Date_y']
df1 = df1.merge(df2, how='left', on='Name')[cols]\
.rename(columns=lambda x: x.split('_')[0]).fillna(df1)
df1
Name Unit Attribute Date
0 a F 1 2019
1 b G 2 2020
2 c C 3 2016
3 d D 4 2017
4 e H 5 2021
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With