Pandas efficient check if column contains string in other column

Tags:

python

pandas

I'm trying to get a boolean index of whether one column contains a string from the same row in another column:

a      b
boop   beep bop
zorp   zorpfoo
zip    foo zip fa

In check to see if column b contains a string, I'd like to get:

Click to copy

[False, True, True]

Right now I'm trying this approach, but it is slow:

Click to copy

df.apply(lambda row: row['a'] in row['b'], axis=1)

Is there a .str method for this?

347

asked Oct 20 '15 19:10

Luke

1 Answers

df.apply(..., axis=1) is is very slow! you should avoid to use it!

Click to copy

from random import sample
from string import lowercase
from pandas import DataFrame

df = DataFrame({
    'a': map(lambda x: ''.join(sample(lowercase, 2)), range(100000)),
    'b': map(lambda x: ''.join(sample(lowercase, 5)), range(100000))
})

%time map(lambda (x, y): x in y, zip(df['a'], df['b']))

%time df.apply(lambda x: x[0] in x[1], axis=1)

141

answered Oct 20 '22 01:10

xmduhan

Related questions
                            
                                Is there a way to access a function's attributes/parameters within a ContextDecorator?
                            
                                numpy "Mean of empty slice." warning
                            
                                Resampling in Pandas while keeping value associations
                            
                                loop to make every combination of several lists
                            
                                How to split a sorted list into sub lists when two neighboring value difference is larger than a threshold
                            
                                ffmpeg in Python subprocess - Unable to find a suitable output format for 'pipe:'
                            
                                What should a Python project structure look like for Travis CI to find and run tests?
                            
                                Image to text recognition using Tesseract-OCR is better when Image is preprocessed manually using Gimp than my Python Code
                            
                                Using numba.jit with scipy.integrate.ode
                            
                                Is it possible to output to and monitor streams other than stdin, stdout & stderr? (python)
                            
                                sklearn agglomerative clustering input data
                            
                                Segfault on 2nd connection with pyodbc to mirrored MS SQL Server
                            
                                slice pandas df based on n consecutive instances of element
                            
                                Error opening sqlite table using pandas
                            
                                keep a continuous mongo connection active using pymongo
                            
                                Change python interpretor mid-script
                            
                                Python generators; two apparently identical programs work differently
                            
                                Getting current frame with OpenCV VideoCapture in Python
                            
                                Getting broken pipe when passing mysql connection to a python thread
                            
                                Interprogram communication in python on Linux

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas efficient check if column contains string in other column

Tags:

python

pandas

Luke

People also ask

1 Answers

df.apply(..., axis=1) is is very slow! you should avoid to use it!

xmduhan

Recent Activity

Donate For Us