Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a Perl equivalent of Python's re.findall/re.finditer (iterative regex results)?

In Python compiled regex patterns have a findall method that does the following:

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

What's the canonical way of doing this in Perl? A naive algorithm I can think of is along the lines of "while a search and replace with the empty string is successful, do [suite]". I'm hoping there's a nicer way. :-)

Thanks in advance!

like image 316
cdleary Avatar asked Jan 22 '09 01:01

cdleary


People also ask

What is the difference between Findall and Finditer in Python?

But finditer and findall are finding different things. Findall indeed finds all the matches in the given string. But finditer only finds the first one, returning an iterator with only one element.

What is Finditer in regex?

Finditer method finditer() works exactly the same as the re. findall() method except it returns an iterator yielding match objects matching the regex pattern in a string instead of a list. It scans the string from left to right, and matches are returned in the iterator form.

What does regex Findall return?

Regex's findall() function is extremely useful as it returns a list of strings containing all matches. If the pattern is not found, re. findall() returns an empty list.

What is the difference between re search and re Findall?

Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.


3 Answers

Use the /g modifier in your match. From the perlop manual:

The "/g" modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

In scalar context, each execution of "m//g" finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function; see "pos" in perlfunc. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the "/c" modifier (e.g. "m//gc"). Modifying the target string also resets the search position.

like image 57
Chris Jester-Young Avatar answered Oct 05 '22 05:10

Chris Jester-Young


To build on Chris' response, it's probably most relevant to encase the //g regex in a while loop, like:

my @matches;
while ( 'foobarbaz' =~ m/([aeiou])/g )
{
    push @matches, $1;
}

Pasting some quick Python I/O:

>>> import re
>>> re.findall(r'([aeiou])([nrs])','I had a sandwich for lunch')
[('a', 'n'), ('o', 'r'), ('u', 'n')]

To get something comparable in Perl, the construct could be something like:

my $matches = [];
while ( 'I had a sandwich for lunch' =~ m/([aeiou])([nrs])/g )
{
    push @$matches, [$1,$2];
}

But in general, whatever function you're iterating for, you can probably do within the while loop itself.

like image 34
kyle Avatar answered Oct 05 '22 05:10

kyle


Nice beginner reference with similar content to @kyle's answer: Perl Tutorial: Using regular expressions

like image 44
cdleary Avatar answered Oct 05 '22 04:10

cdleary