I would like to count how many instances of column A and B intersect. The rows in Column A and B are lists of strings. For example, column A may contain [car, passenger, truck] and column B may contain [car, house, flower, truck]. Since in this case, 2 strings overlap, column C should display -> 2
I have tried (none of these work):
df['unique'] = np.unique(frame[['colA', 'colB']])
or
def unique(colA, colB):
unique1 = list(set(colA) & set(colB))
return unique1
df['unique'] = df.apply(unique, args=(df['colA'], frame['colB']))
TypeError: ('unique() takes 2 positional arguments but 3 were given', 'occurred at index article')
I believe need length
with set.intersection
in list comprehension:
df['C'] = [len(set(a).intersection(b)) for a, b in zip(df.A, df.B)]
Or:
df['C'] = [len(set(a) & set(b)) for a, b in zip(df.A, df.B)]
Sample:
df = pd.DataFrame(data={'A':[['car', 'passenger', 'truck'], ['car', 'truck']],
'B':[['car', 'house', 'flower', 'truck'], ['car', 'house']]})
print (df)
A B
0 [car, passenger, truck] [car, house, flower, truck]
1 [car, truck] [car, house]
df['C'] = [len(set(a).intersection(b)) for a, b in zip(df.A, df.B)]
print (df)
A B C
0 [car, passenger, truck] [car, house, flower, truck] 2
1 [car, truck] [car, house] 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With