I have a pandas dataframe that I'd like to filter by a specific word (test) in a column. I tried: <code>df[df[col].str.contains('test')]</code> But it returns an empty dataframe with just the column names. For the output, I'm looking for a dataframe that'd contain all rows that contain the word 'test'. What can I do? EDIT (to add samples): <code>data = pd.read_csv(/...csv)</code> data has 5 cols, including <code>'BusinessDescription'</code>, and I want to extract all rows that have the word 'dental' (case insensitive) in the <code>Business Description</code> col, so I used: <code>filtered = data[data['BusinessDescription'].str.contains('dental')==True]</code> and I get an empty dataframe, with just the header names of the 5 cols.

Keep the string enclosed in quotes. <pre class="prettyprint"><code>df[df['col'].str.contains('test')] </code></pre> Thanks

how to filter pandas dataframe by string?

Video Answer

2 Answers

It seems you need parameter flags in contains:

Click to copy

import re

filtered = data[data['BusinessDescription'].str.contains('dental', flags = re.IGNORECASE)]

Another solution, thanks Anton vBR is convert to lowercase first:

Click to copy

filtered = data[data['BusinessDescription'].str.lower().str.contains('dental')]

Example:
For future programming I'd recommend using the keyword df instead of data when refering to dataframes. It is the common way around SO to use that notation.

Click to copy

import pandas as pd

data = dict(BusinessDescription=['dental fluss','DENTAL','Dentist'])
df = pd.DataFrame(data)
df[df['BusinessDescription'].str.lower().str.contains('dental')]

  BusinessDescription
0        dental fluss
1              DENTAL

Timings:

Click to copy

d = dict(BusinessDescription=['dental fluss','DENTAL','Dentist'])
data = pd.DataFrame(d)
data = pd.concat([data]*10000).reset_index(drop=True)

#print (data)

In [122]: %timeit data[data['BusinessDescription'].str.contains('dental', flags = re.IGNORECASE)]
10 loops, best of 3: 28.9 ms per loop

In [123]: %timeit data[data['BusinessDescription'].str.lower().str.contains('dental')]
10 loops, best of 3: 32.6 ms per loop

Caveat:

Performance really depend on the data - size of DataFrame and number of values matching condition.

134

answered Sep 21 '22 12:09

jezrael

Keep the string enclosed in quotes.

Click to copy

df[df['col'].str.contains('test')]

Thanks

answered Sep 22 '22 12:09

Nephilim

Related questions
                            
                                Add image annotations to bar plots
                            
                                Keras Conv2d own filters
                            
                                Continue if else in inline for Python
                            
                                Efficiently write a movie directly from np.array using pipes
                            
                                Flask CLI commands and arguments
                            
                                How to use python groupby() [duplicate]
                            
                                Rolling and cumulative standard deviation in a Python dataframe
                            
                                Why did python choose commas over parenthesis in tuple design?
                            
                                Sql Alchemy can't cast jsonb to boolean
                            
                                meaning of comma operator in python
                            
                                How to delete column in 3d numpy array
                            
                                Defining the alphabet to any letter string to then later use to check if a word has a certain amount of characters
                            
                                Replacing non-null values with column names
                            
                                Getting Tor ControlPort to work
                            
                                Pandas merge df error
                            
                                how to multiply multiple columns by another column pandas
                            
                                AES-128 CBC decryption in Python
                            
                                Different background colour areas on matplotlib plot
                            
                                How to convert csv file to text file using python? [duplicate]
                            
                                Python find CRC32 of string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to filter pandas dataframe by string?

Tags:

python

regex

pandas

filter

eh2699

People also ask

Video Answer

2 Answers

jezrael

Nephilim

Recent Activity

Donate For Us