I think pat1 = '[ab]' and pat2 = 'a|b' have the same function in Python(python2.7, windows) 're' module as a regular expression pattern. But I am confused with '[ab]+' and '(a|b)+', do they have the same function, if not can you explain details.
'''
Created on 2012-9-4
@author: melo
'''
import re
pat1 = '(a|b)+'
pat2 = '[ab]+'
text = '22ababbbaa33aaa44b55bb66abaa77babab88'
m1 = re.search(pat1, text)
m2 = re.search(pat2, text)
print 'search with pat1:', m1.group()
print 'search with pat2:', m2.group()
m11 = re.split(pat1, text)
m22 = re.split(pat2, text)
print 'split with pat1:', m11
print 'split with pat2:', m22
m111 = re.findall(pat1, text)
m222 = re.findall(pat2, text)
print 'findall with pat1:', m111
print 'findall with pat2:', m222
output as below:
search with pat1: ababbbaa
search with pat2: ababbbaa
split with pat1: ['22', 'a', '33', 'a', '44', 'b', '55', 'b', '66', 'a', '77', 'b', '88']
split with pat2: ['22', '33', '44', '55', '66', '77', '88']
findall with pat1: ['a', 'a', 'b', 'b', 'a', 'b']
findall with pat2: ['ababbbaa', 'aaa', 'b', 'bb', 'abaa', 'babab']
why are 'pat1' and 'pat2' different and what's their difference? what kind of strings can 'pat1' actually match?
You have a capturing group in the first pattern.
According to the docs,
re.split()
... If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. ...
Try making the group non-capturing and see if you get what you expect:
pat1 = '(?:a|b)+'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With