I'm newbie to pandas, and trying to replace a column value (NaN) in df1 with df2 with column value match. And facing the following error.
df1
unique_col | Measure
944537 NaN
7811403 NaN
8901242114307 1
df2
unique_col | Measure
944537 18
7811403 12
8901242114307 17.5
df1.loc[(df1.unique_col.isin(df2.unique_col) &
df1.Measure.isnull()), ['Measure']] = df2[['Measure']]
I have a two dataframes with 3 million records and on performing below operation facing the following error:
ValueError: cannot reindex from a duplicate axis
You way to easily fill nans is to use fillna
function. In your case, if you have the dfs as (notice the indexes)
unique_col Measure
0 944537 NaN
1 7811403 NaN
2 8901242114307 1.0
unique_col Measure
0 944537 18.0
1 7811403 12.0
2 8901242114307 17.5
You can simply
>>> df.fillna(df2)
unique_col Measure
0 944537 18.0
1 7811403 12.0
2 8901242114307 1.0
If indexes are not the same as the above, you can set them to be the same and use the same function
df = df.set_index('unique_col')
df.fillna(df2.set_index('unique_col'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With