Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different behavior between re.finditer and re.findall

Tags:

python

regex

I am using the following code:

CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>' pattern = re.compile(CARRIS_REGEX, re.UNICODE) matches = pattern.finditer(mailbody) findall = pattern.findall(mailbody) 

But finditer and findall are finding different things. Findall indeed finds all the matches in the given string. But finditer only finds the first one, returning an iterator with only one element.

How can I make finditer and findall behave the same way?

Thanks

like image 230
simao Avatar asked Sep 21 '10 22:09

simao


People also ask

What are the differences between regex methods Findall search and match?

findall() module is used to search for “all” occurrences that match a given pattern. In contrast, search() module will only return the first occurrence that matches the specified pattern. findall() will iterate over all the lines of the file and will return all non-overlapping matches of pattern in a single step.

What does re Finditer do?

The re. finditer() works exactly the same as the re. findall() method except it returns an iterator yielding match objects matching the regex pattern in a string instead of a list. It scans the string from left to right, and matches are returned in the iterator form.

What is the difference between re search and re match?

There is a difference between the use of both functions. Both return the first match of a substring found in the string, but re. match() searches only from the beginning of the string and return match object if found. But if a match of substring is found somewhere in the middle of the string, it returns none.

What is the use of re Findall in Python?

findall() is probably the single most powerful function in the re module. Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.


2 Answers

I can't reproduce this here. Have tried it with both Python 2.7 and 3.1.

One difference between finditer and findall is that the former returns regex match objects whereas the other returns a tuple of the matched capturing groups (or the entire match if there are no capturing groups).

So

import re CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>' pattern = re.compile(CARRIS_REGEX, re.UNICODE) mailbody = open("test.txt").read() for match in pattern.finditer(mailbody):     print(match) print() for match in pattern.findall(mailbody):     print(match) 

prints

<_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98> <_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98> <_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98> <_sre.SRE_Match object at 0x00A63758> <_sre.SRE_Match object at 0x00A63F98>  ('790', 'PR. REAL', '21:06', '04m') ('758', 'PORTAS BENFICA', '21:10', '09m') ('790', 'PR. REAL', '21:14', '13m') ('758', 'PORTAS BENFICA', '21:21', '19m') ('790', 'PR. REAL', '21:29', '28m') ('758', 'PORTAS BENFICA', '21:38', '36m') ('758', 'SETE RIOS', '21:49', '47m') ('758', 'SETE RIOS', '22:09', '68m') 

If you want the same output from finditer as you're getting from findall, you need

for match in pattern.finditer(mailbody):     print(tuple(match.groups())) 
like image 73
Tim Pietzcker Avatar answered Sep 18 '22 14:09

Tim Pietzcker


re.findall(pattern.string)

findall() returns all non-overlapping matches of pattern in string as a list of strings.

re.finditer()

finditer() returns callable object.

In both functions, the string is scanned from left to right and matches are returned in order found.

like image 43
Ayush Avatar answered Sep 17 '22 14:09

Ayush