I am trying to match a pipe character in a string using a Python regex and I can't seem to get it to match. I've boiled it down to a simplified version.
Let's say I am looking for the sequence z|a in a string. Here are some possible regexes and the results:
>>> import re
>>> re.match(r'|', 'xyz|abc')
<_sre.SRE_Match object at 0x2d9a850>
>>> re.match(r'z|', 'xyz|abc')
<_sre.SRE_Match object at 0x2d9a780>
>>> re.match(r'|a', 'xyz|abc')
<_sre.SRE_Match object at 0x2d9a850>
>>> re.match(r'z|a', 'xyz|abc')
>>> re.match(r'z\|a', 'xyz|abc')
>>> re.match(r'z\\|a', 'xyz|abc')
>>> re.match(r'z\\\|a', 'xyz|abc')
>>> re.match(r'z[|]a', 'xyz|abc')
>>>
So I can match with |, |a and z| but I can't find a way to match z|a. Any ideas?
re.match() is looking for a match at the start of the string. Use re.search() instead.
The patterns you have that match are matching the empty string. i.e. r'|' is empty string or empty string, r'z|' is z or empty string and '|a' is empty string or a. all of those will match on any string.
>>> re.match('z\\|a', 'xyz|abc')
>>> re.search('z\\|a', 'xyz|abc')
<_sre.SRE_Match object at 0x02BF2BB8>
>>> re.search(r'z\|a', 'xyz|abc')
<_sre.SRE_Match object at 0x02BF2BF0>
More generally you can use re.escape() on a literal string that you need to include in the middle of a more complex regular expression to avoid having to figure out how many backslashes you need to unescape things.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With