Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

re.findall not returning full match?

I have a file that includes a bunch of strings like "size=XXX;". I am trying Python's re module for the first time and am a bit mystified by the following behavior: if I use a pipe for 'or' in a regular expression, I only see that bit of the match returned. E.g.:

>>> myfile = open('testfile.txt', 'r').read() >>> re.findall('size=50;', myfile) ['size=50;', 'size=50;', 'size=50;', 'size=50;']  >>> re.findall('size=51;', myfile) ['size=51;', 'size=51;', 'size=51;']  >>> re.findall('size=(50|51);', myfile) ['51', '51', '51', '50', '50', '50', '50']  >>> re.findall(r'size=(50|51);', myfile) ['51', '51', '51', '50', '50', '50', '50'] 

The "size=" part of the match is gone (Yet it is certainly used in the search, otherwise there would be more results). What am I doing wrong?

like image 618
Ben S. Avatar asked Aug 25 '13 03:08

Ben S.


People also ask

What does regex Findall return?

findall(): Finding all matches in a string/list. Regex's findall() function is extremely useful as it returns a list of strings containing all matches. If the pattern is not found, re. findall() returns an empty list.

How does regex Findall work?

The findall() function scans the string from left to right and finds all the matches of the pattern in the string . The result of the findall() function depends on the pattern: If the pattern has no capturing groups, the findall() function returns a list of strings that match the whole pattern.

What type of thing does Findall return?

If the pattern includes no parenthesis, then findall() returns a list of found strings as in earlier examples. If the pattern includes a single set of parenthesis, then findall() returns a list of strings corresponding to that single group.

What does re match () return?

This function only checks for a match at the beginning of the string. This means that re. match() will return the match found in the first line of the string, but not those found in any other line, in which case it will return null .


1 Answers

The problem you have is that if the regex that re.findall tries to match captures groups (i.e. the portions of the regex that are enclosed in parentheses), then it is the groups that are returned, rather than the matched string.

One way to solve this issue is to use non-capturing groups (prefixed with ?:).

>>> import re >>> s = 'size=50;size=51;' >>> re.findall('size=(?:50|51);', s) ['size=50;', 'size=51;'] 

If the regex that re.findall tries to match does not capture anything, it returns the whole of the matched string.

Although using character classes might be the simplest option in this particular case, non-capturing groups provide a more general solution.

like image 193
Volatility Avatar answered Oct 06 '22 21:10

Volatility