I am using the following code: <pre class="prettyprint"><code>CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>' pattern = re.compile(CARRIS_REGEX, re.UNICODE) matches = pattern.finditer(mailbody) findall = pattern.findall(mailbody) </code></pre> But finditer and findall are finding different things. Findall indeed finds all the matches in the given string. But finditer only finds the first one, returning an iterator with only one element. How can I make finditer and findall behave the same way? Thanks

I can't reproduce this here. Have tried it with both Python 2.7 and 3.1. One difference between <code>finditer</code> and <code>findall</code> is that the former returns regex match objects whereas the other returns a tuple of the matched capturing groups (or the entire match if there are no capturing groups). So <pre class="prettyprint"><code>import re CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>' pattern = re.compile(CARRIS_REGEX, re.UNICODE) mailbody = open("test.txt").read() for match in pattern.finditer(mailbody): print(match) print() for match in pattern.findall(mailbody): print(match) </code></pre> prints <pre class="prettyprint"><code><_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98> <_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98> <_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98> <_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98> ('790', 'PR. REAL', '21:06', '04m') ('758', 'PORTAS BENFICA', '21:10', '09m') ('790', 'PR. REAL', '21:14', '13m') ('758', 'PORTAS BENFICA', '21:21', '19m') ('790', 'PR. REAL', '21:29', '28m') ('758', 'PORTAS BENFICA', '21:38', '36m') ('758', 'SETE RIOS', '21:49', '47m') ('758', 'SETE RIOS', '22:09', '68m') </code></pre> If you want the same output from <code>finditer</code> as you're getting from <code>findall</code>, you need <pre class="prettyprint"><code>for match in pattern.finditer(mailbody): print(tuple(match.groups())) </code></pre>

<blockquote> re.findall(pattern.string) findall() returns all non-overlapping matches of pattern in string as a list of strings. re.finditer() finditer() returns callable object. In both functions, the string is scanned from left to right and matches are returned in order found. </blockquote>

Different behavior between re.finditer and re.findall

Tags:

python

regex

I am using the following code:

CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>' pattern = re.compile(CARRIS_REGEX, re.UNICODE) matches = pattern.finditer(mailbody) findall = pattern.findall(mailbody)

But finditer and findall are finding different things. Findall indeed finds all the matches in the given string. But finditer only finds the first one, returning an iterator with only one element.

How can I make finditer and findall behave the same way?

Thanks

230

asked Sep 21 '10 22:09

simao

2 Answers

I can't reproduce this here. Have tried it with both Python 2.7 and 3.1.

One difference between finditer and findall is that the former returns regex match objects whereas the other returns a tuple of the matched capturing groups (or the entire match if there are no capturing groups).

import re CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>' pattern = re.compile(CARRIS_REGEX, re.UNICODE) mailbody = open("test.txt").read() for match in pattern.finditer(mailbody):     print(match) print() for match in pattern.findall(mailbody):     print(match)

prints

<_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98> <_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98> <_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98> <_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98>  ('790', 'PR. REAL', '21:06', '04m') ('758', 'PORTAS BENFICA', '21:10', '09m') ('790', 'PR. REAL', '21:14', '13m') ('758', 'PORTAS BENFICA', '21:21', '19m') ('790', 'PR. REAL', '21:29', '28m') ('758', 'PORTAS BENFICA', '21:38', '36m') ('758', 'SETE RIOS', '21:49', '47m') ('758', 'SETE RIOS', '22:09', '68m')

If you want the same output from finditer as you're getting from findall, you need

for match in pattern.finditer(mailbody):     print(tuple(match.groups()))

answered Sep 18 '22 14:09

Tim Pietzcker

re.findall(pattern.string)

findall() returns all non-overlapping matches of pattern in string as a list of strings.

re.finditer()

finditer() returns callable object.

In both functions, the string is scanned from left to right and matches are returned in order found.

answered Sep 17 '22 14:09

Ayush

Related questions
                            
                                Unmelt Pandas DataFrame
                            
                                Django Rest Framework model Id field in nested relationship serializer
                            
                                Removing elements from an array that are in another array
                            
                                How to initialise only optimizer variables in Tensorflow?
                            
                                Is removing an element from the front of a list cheap in Python?
                            
                                python 3 error RuntimeError: super(): no arguments
                            
                                Completion in IPython (jupyter) does now work (unexpected keyword argument 'column')
                            
                                Translating python dictionary to C++
                            
                                Python:Extend the 'dict' class
                            
                                Replacing a Django image doesn't delete original
                            
                                Is there something better than django-piston? [closed]
                            
                                Insert image in openpyxl
                            
                                Line is too long. Django PEP8
                            
                                How to sort a dictionary by value (DESC) then by key (ASC)?
                            
                                Python 3.2 Lambda Syntax Error [duplicate]
                            
                                Make contour of scatter
                            
                                KeyError when indexing Pandas dataframe
                            
                                Ceil and floor equivalent in Python 3 without Math module?
                            
                                Creating a temporary directory in PyTest
                            
                                Nesting 'WITH' statements in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With