if df1 is:
size_a size_b
0 1 2
1 1 5
2 2 3
3 2 9
4 3 1
5 3 5
6 4 4
and df2 is:
size_a size_b
0 1 2
1 2 NaN
2 3 NaN
I want the result as:
size_a size_b
0 1 2
1 2 3
2 2 9
3 3 1
4 3 5
To do the intersection I want to consider only Non-Nan values of df2- where ever there is a NaN in df2 that column value should be ignored to perform the intersection.
I think you can merge
them twice and concat
the results:
a. Normal merge
:
part1 = pd.merge(df1, df2)
b. Merge the subset of rows with NaN
s:
nans = df2[df2.size_b.isnull()]
part2 = pd.merge(df1, nans[["size_a"]], on="size_a")
c. concat
them
pd.concat([part1, part2], ignore_index=True)
The result:
size_a size_b
0 1 2
1 2 3
2 2 9
3 3 1
4 3 5
One way is first joining by the column(s) that require a non-wildcard join. This will help reduce the conditional filters you would have to build downstream. In the example above, I see that size_a
is one of those columns:
new_df = df1.merge(df2, how='inner', on='size_a')
Next you would want to apply the filter conditions where any of the other columns has a match or where the values of those columns in df2 is NaN
.
new_df = new_df[(new_df['size_b_x'] == new_df['size_b_y']) | new_df['size_b_y'].isnull()]
Finally, drop the unnecessary column(s) from df2 (denoted by _y
as suffix in the column names)
new_df = new_df.drop('size_b_y', 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With