Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Use a Regex to Filter Data

Tags:

python

regex

Is there a simple way to remove all characters from a given string that match a given regular expression? I know in Ruby I can use gsub:

>> key = "cd baz ; ls -l"
=> "cd baz ; ls -l"
>> newkey = key.gsub(/[^\w\d]/, "")
=> "cdbazlsl"

What would the equivalent function be in Python?

like image 996
Chris Bunch Avatar asked Aug 16 '09 17:08

Chris Bunch


People also ask

How we can use regular expression in filters?

Set up a regular expression filter or ruleSelect a setting or metric from the + Attribute or metric list. Select the Matches regular expression comparator and enter a regular expression. See syntax and examples below. Select any additional criteria from the next + Attribute or metric list.

Can I use regex in Pandas?

A regular expression (regex) is a sequence of characters that define a search pattern. To filter rows in Pandas by regex, we can use the str. match() method.

Does Python replace work with regex?

Regex can be used to perform various tasks in Python. It is used to do a search and replace operations, replace patterns in text, check if a string contains the specific pattern.

Can you use regex in Python?

Python has a module named re to work with regular expressions. To use it, we need to import the module. The module defines several functions and constants to work with RegEx.


2 Answers

import re
re.sub(pattern, '', s)

Docs

like image 163
SilentGhost Avatar answered Oct 03 '22 23:10

SilentGhost


The answers so far have focused on doing the same thing as your Ruby code, which is exactly the reverse of what you're asking in the English part of your question: the code removes character that DO match, while your text asks for

a simple way to remove all characters from a given string that fail to match

For example, suppose your RE's pattern was r'\d{2,}', "two or more digits" -- so the non-matching parts would be all non-digits plus all single, isolated digits. Removing the NON-matching parts, as your text requires, is also easy:

>>> import re
>>> there = re.compile(r'\d{2,}')
>>> ''.join(there.findall('123foo7bah45xx9za678'))
'12345678'

Edit: OK, OP's clarified the question now (he did indeed mean what his code, not his text, said, and now the text is right too;-) but I'm leaving the answer in for completeness (the other answers suggesting re.sub are correct for the question as it now stands). I realize you probably mean what you "say" in your Ruby code, and not what you say in your English text, but, just in case, I thought I'd better complete the set of answers!-)

like image 38
Alex Martelli Avatar answered Oct 04 '22 00:10

Alex Martelli