Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do Python findall() and finditer() return empty matches on unanchored .* searches?

Tags:

python

regex

The Python docs for findall() and finditer() state that:

Empty matches are included in the result unless they touch the beginning of another match

This can be demonstrated as follows:

In [20]: [m.span() for m in re.finditer('.*', 'test')]
Out[20]: [(0, 4), (4, 4)]

Can anyone tell me though, why this pattern returns an empty match in the first place? Shouldn't .* consume the entire string and return a single match? And further, why is there no empty match at the end if I anchor the pattern to the beginning of the string? e.g.

In [22]: [m.span() for m in re.finditer('^.*', 'test')]
Out[22]: [(0, 4)]
like image 572
Vortura Avatar asked Sep 03 '14 15:09

Vortura


People also ask

What is the difference between Findall and Finditer in Python?

But finditer and findall are finding different things. Findall indeed finds all the matches in the given string. But finditer only finds the first one, returning an iterator with only one element.

What does Finditer return Python?

Introduction to the Python regex finditer function The finditer() function matches a pattern in a string and returns an iterator that yields the Match objects of all non-overlapping matches. In this syntax: pattern is regular expression that you want to search for in the string. string is the input string.

Does Findall return lists?

findall() method returns a list of strings. Each string element is a matching substring of the string argument.

What is the use of Findall() in Python?

The findall () function scans the string from left to right and finds all the matches of the pattern in the string. The result of the findall () function depends on the pattern: If the pattern has no capturing groups, the findall () function returns a list of strings that match the whole pattern.

What does re finditer return in Python?

The expression re.finditer () returns an iterator yielding MatchObject instances over all non-overlapping matches for the re pattern in the string.

How do you find all matches in a string in Python?

finditer method The re.finditer () works exactly the same as the re.findall () method except it returns an iterator yielding match objects matching the regex pattern in a string instead of a list. It scans the string from left-to-right, and matches are returned in the iterator form. Later, we can use this iterator object to extract all matches.

How to find all non-overlapping matches of a string in Python?

The pattern is a continuous occurrence of alphabets. We will find all the non-overlapping matches of this pattern in the string using re.findall () function. We shall print the list returned by findall () function.


1 Answers

  1. .* is zero or more, so once the four characters are consumed, the zero-length empty string at the end (which doesn't touch the start of any match) still remains; and
  2. The empty string at the end doesn't match the pattern - it doesn't start at the start of the string.
like image 163
jonrsharpe Avatar answered Oct 14 '22 19:10

jonrsharpe