I have two columns in a dataframe. The first one contains a string in each row. The second contains a set of strings for each row. How can i check, for each row, whether the value from the first column is in the set of the second using pandas functions and that its efficient?
pd.DataFrame([np.random.randint(5, size=12), np.random.randint(5, size=(12,5))]).T
How to check if the value from column 0 in the list of column 1
with a list comprehension and zip
(IMO this will be faster than apply
):
df=df.assign(Check=[a in b for a,b in zip(df[0],df[1])])
0 1 Check
0 4 [4, 4, 2, 3, 0] True
1 4 [1, 0, 2, 1, 4] True
2 0 [2, 1, 1, 2, 2] False
3 0 [0, 3, 3, 2, 3] True
4 4 [3, 0, 0, 3, 1] False
5 1 [0, 2, 0, 3, 4] False
6 0 [4, 3, 4, 1, 1] False
7 1 [2, 0, 0, 3, 1] True
8 2 [3, 3, 3, 2, 4] True
9 2 [3, 0, 0, 4, 1] False
10 0 [3, 3, 3, 4, 3] False
11 1 [0, 3, 3, 2, 1] True
Performance on the test data:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With