I have 2 dataframe that contain lists and i want to keep the elements of the first dataframe that are contained in the second dataframe. Is it possible or i must try some other data structures?
example of input:
df1:
elem1
a,c,v,b,n
b
c,x,a
df2:
elem2
j,k,a,i,v
o,b
g,f,w
expected output:
elem
a,v
b
NaN
so first of all you can create a regular expression of letters you want to match
In [77]:
chars = df2.elem2.str.replace(',' , '|')
chars
Out[77]:
0 j|k|a|i|v
1 o|b
2 g|f|w
Name: elem2, dtype: object
the concatenate both into a data frame in order to perform a custom function later
In [24]:
to_compare = pd.concat([df1 , chars] , axis = 1)
to_compare
Out[24]:
elem1 elem2
0 a,c,v,b,n j|k|a|i|v
1 b o|b
2 c,x,a g|f|w
finally use your regular expression to match the date from elem1
In [76]:
to_compare.apply( lambda x : ','.join(re.findall(x['elem2'] , x['elem1'])) , axis = 1)
Out[76]:
0 a,v
1 b
2
dtype: object
if you want to convert empty string from the final result to NAN , I'll leave you to figure it out on your own :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With