Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

detect overlapping values in 2 dataframes

df1 = pd.DataFrame({"fields": [["boy", "apple", "toy", "orange", "bear", "eat"], 
                              ["orange", "girl", "red"]]})
df2 = pd.DataFrame({"other fields": [["boy", "girl", "orange"]}) 

and I want to add a column to df1 indicating that the fields overlap with other fields, sample output:

|fields| overlap?|
|------|---------|
|boy   |Y
|apple |N
|toy   |N
|orange|Y
|bear  |N
|eat   |N
|orange|Y
|girl  |Y
|red   |N

first I will explode fields on df1, but I am not sure what the next steps are to check overlap values between 2 dataframes. Thanks!

like image 450
siusiusiu Avatar asked Sep 11 '25 19:09

siusiusiu


1 Answers

You can also do it without apply. As you said you can explode, then using isin you can check whether values exist in df2 which will return True / False and then mapping 'Y' / 'N' on that:

df1_exp = df1.explode('fields',ignore_index=True)
df1_exp['overlap'] = df1_exp['fields'].isin(df2['other fields']).map({True:'Y',False:'N'})


   fields overlap
0     boy       Y
1   apple       N
2     toy       N
3  orange       Y
4    bear       N
5     eat       N
6  orange       Y
7    girl       Y
8     red       N
like image 135
sophocles Avatar answered Sep 14 '25 11:09

sophocles