Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python return matching and non-matching patterns of string

I would like to split a string into parts that match a regexp pattern and parts that do not match into a list.

For example

import re
string = 'my_file_10'
pattern = r'\d+$'
#  I know the matching pattern can be obtained with :
m = re.search(pattern, string).group()
print m
'10'
#  The final result should be as following
['my_file_', '10']
like image 520
user1850133 Avatar asked Jun 27 '14 17:06

user1850133


People also ask

How do you check if a string matches a pattern in Python?

Method : Using join regex + loop + re.match() In this, we create a new regex string by joining all the regex list and then match the string against it to check for match using match() with any of the element of regex list.

How do I extract a specific pattern from a string in Python?

Use re.search() to extract a substring matching a regular expression pattern. Specify the regular expression pattern as the first parameter and the target string as the second parameter. \d matches a digit character, and + matches one or more repetitions of the preceding pattern.

How do you search for a regex pattern at the beginning of a string in Python?

match() function of re in Python will search the regular expression pattern and return the first occurrence. The Python RegEx Match method checks for a match only at the beginning of the string. So, if a match is found in the first line, it returns the match object.

How do you match a string in Python?

Exact match (equality comparison): == , != As with numbers, the == operator determines if two strings are equal. If they are equal, True is returned; if they are not, False is returned. It is case-sensitive, and the same applies to comparisons by other operators and methods.


1 Answers

Put parenthesis around the pattern to make it a capturing group, then use re.split() to produce a list of matching and non-matching elements:

pattern = r'(\d+$)'
re.split(pattern, string)

Demo:

>>> import re
>>> string = 'my_file_10'
>>> pattern = r'(\d+$)'
>>> re.split(pattern, string)
['my_file_', '10', '']

Because you are splitting on digits at the end of the string, an empty string is included.

If you only ever expect one match, at the end of the string (which the $ in your pattern forces here), then just use the m.start() method to obtain an index to slice the input string:

pattern = r'\d+$'
match = re.search(pattern, string)
not_matched, matched = string[:match.start()], match.group()

This returns:

>>> pattern = r'\d+$'
>>> match = re.search(pattern, string)
>>> string[:match.start()], match.group()
('my_file_', '10')
like image 102
Martijn Pieters Avatar answered Sep 24 '22 00:09

Martijn Pieters