Faster implementation of pandas apply function

Suppose:

df = DataFrame({'A': ['some text here', 'another text', 'and this'], 
                'B': ['some', 'somethin', 'this']})

I would like to check if df.B[0] is in df.A[0], df.B[1] is in df.A[1] etc.

Current approach

I have the following apply function implementation

df.apply(lambda x: x[1] in x[0], axis=1)

result is a Series of [True, False, True]

which is fine, but for my dataFrame shape (it is in the millions) it takes quite long.
Is there a better (i.e. faster) implamentation?

Unsuccesfull approach

I tried the pandas.Series.str.contains approach, but it can only take a string for the pattern.

df['A'].str.contains(df['B'], regex=False)

768

asked Dec 25 '17 17:12

dimitris_ps

1 Answers

Use np.vectorize - bypasses the apply overhead, so should be a bit faster.

v = np.vectorize(lambda x, y: y in x)

v(df.A, df.B)
array([ True, False,  True], dtype=bool)

Here's a timings comparison -

df = pd.concat([df] * 10000)

%timeit df.apply(lambda x: x[1] in x[0], axis=1)
1 loop, best of 3: 1.32 s per loop

%timeit v(df.A, df.B)
100 loops, best of 3: 5.55 ms per loop

# Psidom's answer
%timeit [b in a for a, b in zip(df.A, df.B)]
100 loops, best of 3: 3.34 ms per loop

Both are pretty competitive options!

Edit, adding timings for Wen's and Max's answers -

# Wen's answer
%timeit df.A.replace(dict(zip(df.B.tolist(),[np.nan]*len(df))),regex=True).isnull()
10 loops, best of 3: 49.1 ms per loop

# MaxU's answer
%timeit df['A'].str.split(expand=True).eq(df['B'], axis=0).any(1)
10 loops, best of 3: 87.8 ms per loop

149

answered Sep 18 '22 01:09

cs95

Related questions
                            
                                Pandas DataFrame from list of lists of dicts
                            
                                Docker: Download all from nltk in Dockerfile
                            
                                Installing Ta-lib creates gcc error
                            
                                Using python, how do you select a random row of a csv file?
                            
                                PermissionError: [Errno 13] Permission denied: 'C:\\Program Files\\Python35\\Lib\\site-packages\\six.py'
                            
                                pip cannot confirm SSL certificate: SSL module is not available
                            
                                Pandas equivalent of SQL case when statement to create new variable
                            
                                How to interpret the observations of RAM environments in OpenAI gym?
                            
                                join two json in Google Cloud Platform with dataflow
                            
                                Heroku logs say "No module named 'urlparse'" when I use import urlparse
                            
                                How to create a list of dictionaries from a list of keys and a list of values
                            
                                Determine the shuffled indices of two lists/arrays
                            
                                Why %time output is Wall time: 0 ns in Jupyter Notebook with IPython?
                            
                                Split a list into other sublists, splitting will be based on a space defined in the main list [duplicate]
                            
                                Efficiently counting number of unique elements - NumPy / Python
                            
                                Filling na values with merge from another dataframe
                            
                                Python 3 Unit tests with user input
                            
                                Make VSCode Variables have Colour
                            
                                What is the fastest way to make a shallow copy of list in Python3.5+?
                            
                                Matplotlib tight_layout -- remove extra white/empty space

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Faster implementation of pandas apply function

Tags:

python

string

pandas

apply