Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex find non digit and/or end of string

Tags:

python

regex

How do I include an end-of-string and one non-digit characters in a python 2.6 regular expression set for searching?

I want to find 10-digit numbers with a non-digit at the beginning and a non-digit or end-of-string at the end. It is a 10-digit ISBN number and 'X' is valid for the final digit.

The following do not work:

is10 = re.compile(r'\D(\d{9}[\d|X|x])[$|\D]')
is10 = re.compile(r'\D(\d{9}[\d|X|x])[\$|\D]')
is10 = re.compile(r'\D(\d{9}[\d|X|x])[\Z|\D]')

The problem arises with the last set: [\$|\D] to match a non-digit or end-of-string.

Test with:

line = "abcd0123456789"
m = is10.search(line)
print m.group(1)

line = "abcd0123456789efg"
m = is10.search(line)
print m.group(1)
like image 410
Clinton Avatar asked Sep 29 '09 17:09

Clinton


1 Answers

You have to group the alternatives with parenthesis, not brackets:

r'\D(\d{9}[\dXx])($|\D)'

| is a different construct than []. It marks an alternative between two patterns, while [] matches one of the contained characters. So | should only be used inside of [] if you want to match the actual character |. Grouping of parts of patterns is done with parenthesis, so these should be used to restrict the scope of the alternative marked by |.

If you want to avoid that this creates match groups, you can use (?: ) instead:

r'\D(\d{9}[\dXx])(?:$|\D)'
like image 58
sth Avatar answered Oct 03 '22 08:10

sth