Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas efficient check if column contains string in other column

Tags:

python

pandas

I'm trying to get a boolean index of whether one column contains a string from the same row in another column:

a      b
boop   beep bop
zorp   zorpfoo
zip    foo zip fa

In check to see if column b contains a string, I'd like to get:

[False, True, True]

Right now I'm trying this approach, but it is slow:

df.apply(lambda row: row['a'] in row['b'], axis=1)

Is there a .str method for this?

like image 347
Luke Avatar asked Oct 20 '15 19:10

Luke


People also ask

How do you see if values in one column are in another pandas?

You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas. series. isin() , str.

How do you check if a column has a string in pandas?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.

Is Iterrows faster than apply?

The results show that apply massively outperforms iterrows . As mentioned previously, this is because apply is optimized for looping through dataframe rows much quicker than iterrows does. While slower than apply , itertuples is quicker than iterrows , so if looping is required, try implementing itertuples instead.


1 Answers

df.apply(..., axis=1) is is very slow! you should avoid to use it!

from random import sample
from string import lowercase
from pandas import DataFrame

df = DataFrame({
    'a': map(lambda x: ''.join(sample(lowercase, 2)), range(100000)),
    'b': map(lambda x: ''.join(sample(lowercase, 5)), range(100000))
})

%time map(lambda (x, y): x in y, zip(df['a'], df['b']))

%time df.apply(lambda x: x[0] in x[1], axis=1)
like image 141
xmduhan Avatar answered Oct 20 '22 01:10

xmduhan