Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas, filter rows which column contain another column

Tags:

python

pandas

How can I filter rows which column contain another column? For example, if we have DT with two columns A, B, can we filter rows with B.contains(A)? Not just if B contains some A values from all A from DT, but just in one row.

A      B
'lol'  'lolec'
'ram'  'rambo'
'ki'   'pio'

Result:
A     B
'lol'  'lolec'
'ram'  'rambo'
like image 908
wowbrowser search Avatar asked Jan 24 '17 13:01

wowbrowser search


1 Answers

You can use boolean indexing with mask created by apply and in if need filter columns A and B per rows:

#if necessary strip ' in all values
df = df.apply(lambda x: x.str.strip("'"))
#df = df.applymap(lambda x: x.strip("'"))

print (df.apply(lambda x: x.A in x.B, axis=1))
0     True
1     True
2    False
dtype: bool

df = df[df.apply(lambda x: x.A in x.B, axis=1)]
print (df)
     A      B
0  lol  lolec
1  ram  rambo

Difference of solutions - input DataFrame is changed:

print (df)
     A      B
0  lol    pio
1  ram  rambo
2   ki  lolec

print (df[df.apply(lambda x: x.A in x.B, axis=1)])
     A      B
1  ram  rambo

print (df[df['B'].str.contains("|".join(df['A']))])
    A      B
1  ram  rambo
2   ki  lolec
like image 61
jezrael Avatar answered Oct 16 '22 11:10

jezrael