I need to find all matches in a string for a given regex. I've been using <code>findall()</code> to do that until I came across a case where it wasn't doing what I expected. For example: <pre class="prettyprint"><code>regex = re.compile('(\d+,?)+') s = 'There are 9,000,000 bicycles in Beijing.' print re.search(regex, s).group(0) > 9,000,000 print re.findall(regex, s) > ['000'] </code></pre> In this case <code>search()</code> returns what I need (the longest match) but <code>findall()</code> behaves differently, although the docs imply it should be the same: <blockquote> <code>findall()</code> matches all occurrences of a pattern, not just the first one as <code>search()</code> does. </blockquote> <ul> <li>Why is the behaviour different? </li> <li>How can I achieve the result of <code>search()</code> with <code>findall()</code> (or something else)?</li> </ul>

Ok, I see what's going on... from the docs: <blockquote> If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. </blockquote> As it turns out, you do have a group, "(\d+,?)"... so, what it's returning is the last occurrence of this group, or 000. One solution is to surround the entire regex by a group, like this <pre class="prettyprint"><code>regex = re.compile('((\d+,?)+)') </code></pre> then, it will return [('9,000,000', '000')], which is a tuple containing both matched groups. of course, you only care about the first one. Personally, i would use the following regex <pre class="prettyprint"><code>regex = re.compile('((\d+,)*\d+)') </code></pre> to avoid matching stuff like " this is a bad number 9,123," Edit. Here's a way to avoid having to surround the expression by parenthesis or deal with tuples <pre class="prettyprint"><code>s = "..." regex = re.compile('(\d+,?)+') it = re.finditer(regex, s) for match in it: print match.group(0) </code></pre> finditer returns an iterator that you can use to access all the matches found. these match objects are the same that re.search returns, so group(0) returns the result you expect.

python - regex search and findall

Tags:

python

string-matching

regex

search

findall

I need to find all matches in a string for a given regex. I've been using findall() to do that until I came across a case where it wasn't doing what I expected. For example:

Click to copy

regex = re.compile('(\d+,?)+')
s = 'There are 9,000,000 bicycles in Beijing.'

print re.search(regex, s).group(0)
> 9,000,000

print re.findall(regex, s)
> ['000']

In this case search() returns what I need (the longest match) but findall() behaves differently, although the docs imply it should be the same:

findall() matches all occurrences of a pattern, not just the first one as search() does.

Why is the behaviour different?
How can I achieve the result of search() with findall() (or something else)?

218

asked Nov 13 '11 06:11

armandino

1 Answers

Ok, I see what's going on... from the docs:

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

As it turns out, you do have a group, "(\d+,?)"... so, what it's returning is the last occurrence of this group, or 000.

One solution is to surround the entire regex by a group, like this

Click to copy

regex = re.compile('((\d+,?)+)')

then, it will return [('9,000,000', '000')], which is a tuple containing both matched groups. of course, you only care about the first one.

Personally, i would use the following regex

Click to copy

regex = re.compile('((\d+,)*\d+)')

to avoid matching stuff like " this is a bad number 9,123,"

Edit.

Here's a way to avoid having to surround the expression by parenthesis or deal with tuples

Click to copy

s = "..."
regex = re.compile('(\d+,?)+')
it = re.finditer(regex, s)

for match in it:
  print match.group(0)

finditer returns an iterator that you can use to access all the matches found. these match objects are the same that re.search returns, so group(0) returns the result you expect.

133

answered Sep 20 '22 14:09

aleph_null

Related questions
                            
                                matplotlib 3.0.0, cannot import name 'get_backend' from 'matplotlib'
                            
                                How can I convert my datetime column in pandas all to the same timezone
                            
                                Is this the right way to do dependency injection in Django?
                            
                                Title for colorbar in Plotly Heatmap
                            
                                pyenv: no such command `virtualenv'
                            
                                Resources for TDD aimed at Python Web Development [closed]
                            
                                How does python close files that have been gc'ed?
                            
                                HTTP Authentication in Python
                            
                                How to export C# methods?
                            
                                Importing Python module from Bash
                            
                                error in python d not defined. [duplicate]
                            
                                Python tarfile progress output?
                            
                                How to run a code whenever a Tkinter widget value changes?
                            
                                Freeze in Python?
                            
                                How to convert string timezones in form (Country/city) into datetime.tzinfo
                            
                                Using python how to find elements in a list of lists based on a key that is an element of the inner list?
                            
                                OSError 38 [Errno 38] with multiprocessing
                            
                                Python - Multiple frames with Grid manager
                            
                                Extracting text from XML using python
                            
                                How to get value from selected item in treeview in PyGTK?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python - regex search and findall

Tags:

python

string-matching

regex

search

findall

armandino

People also ask

1 Answers

aleph_null

Recent Activity

Donate For Us