Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count instances of strings in multiple columns python

I have the following simple data frame

import pandas as pd
df = pd.DataFrame({'column_a': ['a', 'b', 'c', 'd', 'e'],
                   'column_b': ['b', 'x', 'y', 'c', 'z']})


      column_a column_b
0        a        b
1        b        x
2        c        y
3        d        c
4        e        z

I'm looking to display the strings which occur in both columns:

result = ("b", "c")

Thanks

like image 644
prmlmu Avatar asked Nov 21 '18 15:11

prmlmu


2 Answers

intersection

This generalizes over any number of columns.

set.intersection(*map(set, map(df.get, df)))

{'b', 'c'}
like image 146
piRSquared Avatar answered Nov 20 '22 03:11

piRSquared


Use python's set object:

in_a = set(df.column_a)
in_b = set(df.column_b)
in_both = in_a.intersection(in_b)
like image 5
T Burgis Avatar answered Nov 20 '22 04:11

T Burgis