Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python - regex search and findall

I need to find all matches in a string for a given regex. I've been using findall() to do that until I came across a case where it wasn't doing what I expected. For example:

regex = re.compile('(\d+,?)+')
s = 'There are 9,000,000 bicycles in Beijing.'

print re.search(regex, s).group(0)
> 9,000,000

print re.findall(regex, s)
> ['000']

In this case search() returns what I need (the longest match) but findall() behaves differently, although the docs imply it should be the same:

findall() matches all occurrences of a pattern, not just the first one as search() does.

  • Why is the behaviour different?

  • How can I achieve the result of search() with findall() (or something else)?

like image 218
armandino Avatar asked Nov 13 '11 06:11

armandino


People also ask

What is difference between Search () and Findall () methods in Python?

Here you can see that, search() method is able to find a pattern from any position of the string. The re. findall() helps to get a list of all matching patterns. It searches from start or end of the given string.

What is Findall in regex?

The findall() function scans the string from left to right and finds all the matches of the pattern in the string . The result of the findall() function depends on the pattern: If the pattern has no capturing groups, the findall() function returns a list of strings that match the whole pattern.

How do you use the Findall function in Python?

findall. findall() is probably the single most powerful function in the re module. Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.

What does regex search return?

re.search(): Finding pattern in text The re.search() function will search the regular expression pattern and return the first occurrence. Unlike Python re. match(), it will check all lines of the input string. If the pattern is found, the match object will be returned, otherwise “null” is returned.


1 Answers

Ok, I see what's going on... from the docs:

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

As it turns out, you do have a group, "(\d+,?)"... so, what it's returning is the last occurrence of this group, or 000.

One solution is to surround the entire regex by a group, like this

regex = re.compile('((\d+,?)+)')

then, it will return [('9,000,000', '000')], which is a tuple containing both matched groups. of course, you only care about the first one.

Personally, i would use the following regex

regex = re.compile('((\d+,)*\d+)')

to avoid matching stuff like " this is a bad number 9,123,"

Edit.

Here's a way to avoid having to surround the expression by parenthesis or deal with tuples

s = "..."
regex = re.compile('(\d+,?)+')
it = re.finditer(regex, s)

for match in it:
  print match.group(0)

finditer returns an iterator that you can use to access all the matches found. these match objects are the same that re.search returns, so group(0) returns the result you expect.

like image 133
aleph_null Avatar answered Sep 20 '22 14:09

aleph_null