How do I include an end-of-string and one non-digit characters in a python 2.6 regular expression set for searching?
I want to find 10-digit numbers with a non-digit at the beginning and a non-digit or end-of-string at the end. It is a 10-digit ISBN number and 'X' is valid for the final digit.
The following do not work:
is10 = re.compile(r'\D(\d{9}[\d|X|x])[$|\D]')
is10 = re.compile(r'\D(\d{9}[\d|X|x])[\$|\D]')
is10 = re.compile(r'\D(\d{9}[\d|X|x])[\Z|\D]')
The problem arises with the last set: [\$|\D] to match a non-digit or end-of-string.
Test with:
line = "abcd0123456789"
m = is10.search(line)
print m.group(1)
line = "abcd0123456789efg"
m = is10.search(line)
print m.group(1)
You have to group the alternatives with parenthesis, not brackets:
r'\D(\d{9}[\dXx])($|\D)'
|
is a different construct than []
. It marks an alternative between two patterns, while []
matches one of the contained characters. So |
should only be used inside of []
if you want to match the actual character |
. Grouping of parts of patterns is done with parenthesis, so these should be used to restrict the scope of the alternative marked by |
.
If you want to avoid that this creates match groups, you can use (?: )
instead:
r'\D(\d{9}[\dXx])(?:$|\D)'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With