Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Regex stops after first "|" match [closed]

Tags:

python

regex

p = re.compile("[AG].{2}[ATG|ATA|AAG].{1}G")
regex_result = p.search('ZZZAXXATGXGZZZ')
regex_result.group()
'AXXATG'

I was expecting AXXATGXG instead.

like image 563
fire_water Avatar asked Jan 24 '26 17:01

fire_water


1 Answers

Use a grouping construct (...) rather than a character class [...] around the alternatives:

p = re.compile("[AG].{2}(?:ATG|ATA|AAG).G")
                        ^^^^^^^^^^^^^^^  

The (?:ATG|ATA|AAG) matches 3 sequences: either a ATG, or ATA or AAG. The [ATG|ATA|AAG] character class matches 1 char, either A, T, G or |.

Note the {1} is redundant and can be removed.

Python:

import re
p = re.compile("[AG].{2}(?:ATG|ATA|AAG).G")
regex_result = p.search('ZZZAXXATGXGZZZ')
print(regex_result.group())
# => AXXATGXG

See IDEONE demo

like image 196
Wiktor Stribiżew Avatar answered Jan 26 '26 10:01

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!