I have a dataframe like this
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'A': [1, 2, 3, 2, 3, 1],
'B': [5, 2, 4, 1, 4, 5],
'C': list('abcdef')
}
)
and an array like that
a = np.array([
[1, 5],
[3, 4]
])
I would now like to add an additional column D
to df
which contains the word "found"
based on whether the values of A
and B
are contained as a subset in a
.
A straightforward implementation would be
for li in a.tolist():
m = (df['A'] == li[0]) & (df['B'] == li[1])
df.loc[m, 'D'] = "found"
which gives the desired outcome
A B C D
0 1 5 a found
1 2 2 b NaN
2 3 4 c found
3 2 1 d NaN
4 3 4 e found
5 1 5 f found
Is there a solution which wold avoid the loop?
One option is , we can use merge with indicator
out = df.merge(pd.DataFrame(a,columns=['A','B']),how='left',indicator="D")
out['D'] = np.where(out['D'].eq("both"),"Found","Not Found")
print(out)
A B C D
0 1 5 a Found
1 2 2 b Not Found
2 3 4 c Found
3 2 1 d Not Found
4 3 4 e Found
5 1 5 f Found
Here is one way of doing by using numpy
broadcasting:
m = (df[['A', 'B']].values[:, None] == a).all(-1).any(-1)
df['D'] = np.where(m, 'Found', 'Not found')
A B C D
0 1 5 a Found
1 2 2 b Not found
2 3 4 c Found
3 2 1 d Not found
4 3 4 e Found
5 1 5 f Found
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With