Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expression to filter list of strings matching a pattern

I use R a lot more and it is easier for me to do it in R:

> test <- c('bbb', 'ccc', 'axx', 'xzz', 'xaa')
> test[grepl("^x",test)]
[1] "xzz" "xaa"

But how to do it in python if test is a list?

P.S. I am learning python using google's python exercise and I prefer using regular expression.

like image 706
lokheart Avatar asked Mar 14 '13 06:03

lokheart


People also ask

How do you check if a string matches a regex pattern in Python?

Method : Using join regex + loop + re.match() This task can be performed using combination of above functions. In this, we create a new regex string by joining all the regex list and then match the string against it to check for match using match() with any of the element of regex list.

Can regular expressions be used to handle pattern matching issues in Python?

Since then, regexes have appeared in many programming languages, editors, and other tools as a means of determining whether a string matches a specified pattern. Python, Java, and Perl all support regex functionality, as do most Unix tools and many text editors.

How do I match a regex pattern?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

How do you filter a list of strings in Python?

Filter a list of string using filter() method. filter() method accepts two parameters. The first parameter takes a function name or None and the second parameter takes the name of the list variable as values. filter() method stores those data from the list if it returns true, otherwise, it discards the data.


1 Answers

In general, you may use

import re                                  # Add the re import declaration to use regex
test = ['bbb', 'ccc', 'axx', 'xzz', 'xaa'] # Define a test list
reg = re.compile(r'^x')                    # Compile the regex
test = list(filter(reg.search, test))      # Create iterator using filter, cast to list 
# => ['xzz', 'xaa']

Or, to inverse the results and get all items that do not match the regex:

list(filter(lambda x: not reg.search(x), test))
# >>> ['bbb', 'ccc', 'axx']

See the Python demo.

USAGE NOTE:

  • re.search finds the first regex match anywhere in a string and returns a match object, otherwise None
  • re.match looks for a match only at the string start, it does NOT require a full string match. So, re.search(r'^x', text) = re.match(r'x', text)
  • re.fullmatch only returns a match if the full string matches the pattern, so, re.fullmatch(r'x') = re.match(r'x\Z') = re.search(r'^x\Z').

If you wonder what the r'' prefix means, see Python - Should I be using string prefix r when looking for a period (full stop or .) using regex? and Python regex - r prefix.

like image 71
Wiktor Stribiżew Avatar answered Sep 23 '22 01:09

Wiktor Stribiżew