How to filter rows in pandas by regex

People also ask

How do I filter specific rows from a DataFrame Pandas?

Filter Rows by Condition You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.

Can you use regex in Pandas?

If you need to extract data that matches regex pattern from a column in Pandas dataframe you can use extract method in Pandas pandas.

How do you slice rows in Pandas?

Slicing Rows and Columns by Index PositionWhen slicing by index position in Pandas, the start index is included in the output, but the stop index is one step beyond the row you want to select. So the slice return row 0 and row 1, but does not return row 2. The second slice [:] indicates that all columns are required.

Use contains instead:

In [10]: df.b.str.contains('^f')
Out[10]: 
0    False
1     True
2     True
3    False
Name: b, dtype: bool

There is already a string handling function Series.str.startswith(). You should try foo[foo.b.str.startswith('f')].

Result:

    a   b
1   2   foo
2   3   fat

I think what you expect.

Alternatively you can use contains with regex option. For example:

foo[foo.b.str.contains('oo', regex= True, na=False)]

Result:

    a   b
1   2   foo

na=False is to prevent Errors in case there is nan, null etc. values

It may be a bit late, but this is now easier to do in Pandas by calling Series.str.match. The docs explain the difference between match, fullmatch and contains.

Note that in order to use the results for indexing, set the na=False argument (or True if you want to include NANs in the results).

Multiple column search with dataframe:

frame[frame.filename.str.match('*.'+MetaData+'.*') & frame.file_path.str.match('C:\test\test.txt')]

Building off of the great answer by user3136169, here is an example of how that might be done also removing NoneType values.

def regex_filter(val):
    if val:
        mo = re.search(regex,val)
        if mo:
            return True
        else:
            return False
    else:
        return False

df_filtered = df[df['col'].apply(regex_filter)]

You can also add regex as an arg:

def regex_filter(val,myregex):
    ...

df_filtered = df[df['col'].apply(regex_filter,regex=myregex)]

Write a Boolean function that checks the regex and use apply on the column

foo[foo['b'].apply(regex_function)]

Related questions
                            
                                Apply vs transform on a group object
                            
                                What is the proper way to comment functions in Python?
                            
                                How do I use installed packages in PyCharm?
                            
                                Pandas: Setting no. of max rows
                            
                                How to remove convexity defects in a Sudoku square?
                            
                                Python - List of unique dictionaries
                            
                                Calculating Pearson correlation and significance in Python
                            
                                Is there a library function for Root mean square error (RMSE) in python?
                            
                                xlrd.biffh.XLRDError: Excel xlsx file; not supported [duplicate]
                            
                                Find the most common element in a list
                            
                                Django: "projects" vs "apps"
                            
                                How does tuple comparison work in Python?
                            
                                Python type hinting without cyclic imports
                            
                                Generate a heatmap in MatPlotLib using a scatter data set
                            
                                Writing a Python list of lists to a csv file
                            
                                How do I remove leading whitespace in Python?
                            
                                How can I extract all values from a dictionary in Python?
                            
                                Python Flask, how to set content type
                            
                                How to annotate types of multiple return values?
                            
                                Why do we need to call zero_grad() in PyTorch?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to filter rows in pandas by regex

Tags:

python

regex

pandas

People also ask

Recent Activity

Donate For Us