I am currently following the instructions laid out here for finding values, and it works. The only problem is my dataframe is quite big (5x3500 rows) and I need to perform around ~2000 searches. Each one takes around 4 seconds, so obviously this adds up and has become a bit unsustainable on my end. Most concise way to select rows where any column contains a string in Pandas dataframe? Is there a faster way to search for all rows containing a string value than this? <pre class="prettyprint"><code>df[df.apply(lambda r: r.str.contains('b', case=False).any(), axis=1)] </code></pre>

One trivial possibility is to disable regex: <pre class="prettyprint"><code>res = df[df.apply(lambda r: r.str.contains('b', case=False, regex=False).any(), axis=1)] </code></pre> Another way using a list comprehension: <pre class="prettyprint"><code>res = df[[any('b' in x.lower() for x in row) for row in df.values)]] </code></pre>

What is the fastest way to select rows that contain a value in a Pandas dataframe?

Tags:

performance

python

string

pandas

I am currently following the instructions laid out here for finding values, and it works. The only problem is my dataframe is quite big (5x3500 rows) and I need to perform around ~2000 searches. Each one takes around 4 seconds, so obviously this adds up and has become a bit unsustainable on my end.

Most concise way to select rows where any column contains a string in Pandas dataframe?

Is there a faster way to search for all rows containing a string value than this?

df[df.apply(lambda r: r.str.contains('b', case=False).any(), axis=1)]

632

asked Feb 02 '19 01:02

NBC

2 Answers

You can testing the speed

boolfilter=(np.char.find(df.values.ravel().astype(str),'b')!=-1).reshape(df.shape).any(1)
boolfilter
array([False,  True,  True])
newdf=df[boolfilter]

answered Oct 21 '22 17:10

BENY

One trivial possibility is to disable regex:

res = df[df.apply(lambda r: r.str.contains('b', case=False, regex=False).any(), axis=1)]

Another way using a list comprehension:

res = df[[any('b' in x.lower() for x in row) for row in df.values)]]

answered Oct 21 '22 15:10

jpp

Related questions
                            
                                How do I increase the padding on my pandas dataframe plot? [duplicate]
                            
                                Migrate anaconda from python v3.6 to v3.7 and preserve all conda and pip packages
                            
                                inspect.signature with PEP 563
                            
                                How can I find out / print with which version of the protocol a pickle file has been generated
                            
                                Fitting sklearn GridSearchCV model
                            
                                I am so confused about Object in JavaScript
                            
                                Machine learning odd/even prediction doesn't work (50% success)
                            
                                Speed up computation for Distance Transform on Image in Python
                            
                                How to calculate np.cov on a matrix with np.nan values without converting to pd.DataFrame?
                            
                                Is there a way to use a dataclass, with fields with defaults, with __slots__
                            
                                How to change rotation of xticks in matplotlib?
                            
                                Which of these is the best practice for accessing a variable in a class? [closed]
                            
                                Could not install packages due to an EnvironmentError: [Errno 30] Read-only file system:
                            
                                How to store Dataframe data to Firebase Storage?
                            
                                Mask 2D array preserving shape
                            
                                How to install tensorflow on a offline computer
                            
                                How to read multiple tables from .xls file in python?
                            
                                Split pandas Dataframe into n equal parts + 1
                            
                                Is it ok to call `tape.watch(x)` when `x` is already a `tf.Variable` in TensorFlow?
                            
                                Multiprocessing AsyncResult.get() hangs in Python 3.7.2 but not in 3.6

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With