df1 = pd.DataFrame({"fields": [["boy", "apple", "toy", "orange", "bear", "eat"],
["orange", "girl", "red"]]})
df2 = pd.DataFrame({"other fields": [["boy", "girl", "orange"]})
and I want to add a column to df1 indicating that the fields overlap with other fields, sample output:
|fields| overlap?|
|------|---------|
|boy |Y
|apple |N
|toy |N
|orange|Y
|bear |N
|eat |N
|orange|Y
|girl |Y
|red |N
first I will explode fields on df1, but I am not sure what the next steps are to check overlap values between 2 dataframes. Thanks!
You can also do it without apply
. As you said you can explode
, then using isin
you can check whether values exist in df2 which will return True
/ False
and then mapping 'Y' / 'N' on that:
df1_exp = df1.explode('fields',ignore_index=True)
df1_exp['overlap'] = df1_exp['fields'].isin(df2['other fields']).map({True:'Y',False:'N'})
fields overlap
0 boy Y
1 apple N
2 toy N
3 orange Y
4 bear N
5 eat N
6 orange Y
7 girl Y
8 red N
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With