I have a dataframe like this
import pandas as pd
import numpy as np
df = pd.DataFrame(
    {
        'A': [1, 2, 3, 2, 3, 1],
        'B': [5, 2, 4, 1, 4, 5],
        'C': list('abcdef')
    }
)
and an array like that
a = np.array([
    [1, 5],
    [3, 4]
])
I would now like to add an additional column D to df which contains the word "found" based on whether the values of A and B are contained as a subset in a.
A straightforward implementation would be
for li in a.tolist():
    m = (df['A'] == li[0]) & (df['B'] == li[1])
    df.loc[m, 'D'] = "found"
which gives the desired outcome
   A  B  C      D
0  1  5  a  found
1  2  2  b    NaN
2  3  4  c  found
3  2  1  d    NaN
4  3  4  e  found
5  1  5  f  found
Is there a solution which wold avoid the loop?
One option is , we can use merge with indicator
out = df.merge(pd.DataFrame(a,columns=['A','B']),how='left',indicator="D")
out['D'] = np.where(out['D'].eq("both"),"Found","Not Found")
print(out)
   A  B  C          D
0  1  5  a      Found
1  2  2  b  Not Found
2  3  4  c      Found
3  2  1  d  Not Found
4  3  4  e      Found
5  1  5  f      Found
                        Here is one way of doing by using numpy broadcasting:
m = (df[['A', 'B']].values[:, None] == a).all(-1).any(-1)
df['D'] = np.where(m, 'Found', 'Not found')
   A  B  C          D
0  1  5  a      Found
1  2  2  b  Not found
2  3  4  c      Found
3  2  1  d  Not found
4  3  4  e      Found
5  1  5  f      Found
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With