Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does '[ab]+' equal '(a|b)+' in python re module?

I think pat1 = '[ab]' and pat2 = 'a|b' have the same function in Python(python2.7, windows) 're' module as a regular expression pattern. But I am confused with '[ab]+' and '(a|b)+', do they have the same function, if not can you explain details.

'''
Created on 2012-9-4

@author: melo
'''

import re
pat1 = '(a|b)+'
pat2 = '[ab]+'
text = '22ababbbaa33aaa44b55bb66abaa77babab88'

m1 = re.search(pat1, text)
m2 = re.search(pat2, text)
print 'search with pat1:', m1.group()
print 'search with pat2:', m2.group()

m11 = re.split(pat1, text)
m22 = re.split(pat2, text)
print 'split with pat1:', m11
print 'split with pat2:', m22

m111 = re.findall(pat1, text)
m222 = re.findall(pat2, text)
print 'findall with pat1:', m111
print 'findall with pat2:', m222

output as below:

search with pat1: ababbbaa
search with pat2: ababbbaa
split with pat1: ['22', 'a', '33', 'a', '44', 'b', '55', 'b', '66', 'a', '77', 'b', '88']
split with pat2: ['22', '33', '44', '55', '66', '77', '88']
findall with pat1: ['a', 'a', 'b', 'b', 'a', 'b']
findall with pat2: ['ababbbaa', 'aaa', 'b', 'bb', 'abaa', 'babab']

why are 'pat1' and 'pat2' different and what's their difference? what kind of strings can 'pat1' actually match?

like image 438
imsrch Avatar asked Sep 10 '12 03:09

imsrch


1 Answers

You have a capturing group in the first pattern.

According to the docs,

re.split()
... If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. ...

Try making the group non-capturing and see if you get what you expect:

pat1 = '(?:a|b)+'
like image 113
Wiseguy Avatar answered Oct 06 '22 05:10

Wiseguy