I am using the following code:
CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>' pattern = re.compile(CARRIS_REGEX, re.UNICODE) matches = pattern.finditer(mailbody) findall = pattern.findall(mailbody)
But finditer and findall are finding different things. Findall indeed finds all the matches in the given string. But finditer only finds the first one, returning an iterator with only one element.
How can I make finditer and findall behave the same way?
Thanks
findall() module is used to search for “all” occurrences that match a given pattern. In contrast, search() module will only return the first occurrence that matches the specified pattern. findall() will iterate over all the lines of the file and will return all non-overlapping matches of pattern in a single step.
The re. finditer() works exactly the same as the re. findall() method except it returns an iterator yielding match objects matching the regex pattern in a string instead of a list. It scans the string from left to right, and matches are returned in the iterator form.
There is a difference between the use of both functions. Both return the first match of a substring found in the string, but re. match() searches only from the beginning of the string and return match object if found. But if a match of substring is found somewhere in the middle of the string, it returns none.
findall() is probably the single most powerful function in the re module. Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.
I can't reproduce this here. Have tried it with both Python 2.7 and 3.1.
One difference between finditer
and findall
is that the former returns regex match objects whereas the other returns a tuple of the matched capturing groups (or the entire match if there are no capturing groups).
So
import re CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>' pattern = re.compile(CARRIS_REGEX, re.UNICODE) mailbody = open("test.txt").read() for match in pattern.finditer(mailbody): print(match) print() for match in pattern.findall(mailbody): print(match)
prints
<_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98> <_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98> <_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98> <_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98> ('790', 'PR. REAL', '21:06', '04m') ('758', 'PORTAS BENFICA', '21:10', '09m') ('790', 'PR. REAL', '21:14', '13m') ('758', 'PORTAS BENFICA', '21:21', '19m') ('790', 'PR. REAL', '21:29', '28m') ('758', 'PORTAS BENFICA', '21:38', '36m') ('758', 'SETE RIOS', '21:49', '47m') ('758', 'SETE RIOS', '22:09', '68m')
If you want the same output from finditer
as you're getting from findall
, you need
for match in pattern.finditer(mailbody): print(tuple(match.groups()))
re.findall(pattern.string)
findall() returns all non-overlapping matches of pattern in string as a list of strings.
re.finditer()
finditer() returns callable object.
In both functions, the string is scanned from left to right and matches are returned in order found.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With