Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill column based on subsets of array

I have a dataframe like this

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        'A': [1, 2, 3, 2, 3, 1],
        'B': [5, 2, 4, 1, 4, 5],
        'C': list('abcdef')
    }
)

and an array like that

a = np.array([
    [1, 5],
    [3, 4]
])

I would now like to add an additional column D to df which contains the word "found" based on whether the values of A and B are contained as a subset in a.

A straightforward implementation would be

for li in a.tolist():
    m = (df['A'] == li[0]) & (df['B'] == li[1])
    df.loc[m, 'D'] = "found"

which gives the desired outcome

   A  B  C      D
0  1  5  a  found
1  2  2  b    NaN
2  3  4  c  found
3  2  1  d    NaN
4  3  4  e  found
5  1  5  f  found

Is there a solution which wold avoid the loop?

like image 361
Cleb Avatar asked Jan 25 '23 12:01

Cleb


2 Answers

One option is , we can use merge with indicator

out = df.merge(pd.DataFrame(a,columns=['A','B']),how='left',indicator="D")
out['D'] = np.where(out['D'].eq("both"),"Found","Not Found")

print(out)

   A  B  C          D
0  1  5  a      Found
1  2  2  b  Not Found
2  3  4  c      Found
3  2  1  d  Not Found
4  3  4  e      Found
5  1  5  f      Found
like image 96
anky Avatar answered Jan 27 '23 03:01

anky


Here is one way of doing by using numpy broadcasting:

m = (df[['A', 'B']].values[:, None] == a).all(-1).any(-1)
df['D'] = np.where(m, 'Found', 'Not found')

   A  B  C          D
0  1  5  a      Found
1  2  2  b  Not found
2  3  4  c      Found
3  2  1  d  Not found
4  3  4  e      Found
5  1  5  f      Found
like image 45
Shubham Sharma Avatar answered Jan 27 '23 02:01

Shubham Sharma