Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

perl: best way to match, save and replace a regex globally

Tags:

regex

perl

In a string I want to find all matches of a regex in a string, save the matches and replace the matches. Is there a slick way to do that?

Example:

my $re = qr{\wat};
my $text = "a cat a hat the bat some fat for a rat";
... (substitute $re -> 'xxx' saving matches in @matches)
# $text -> 'a xxx a xxx the xxx some xxx for a xxx'
# @matches -> qw(cat hat bat fat rat)

I've tried: @matches = ($text =~ s{($re)}{xxx}g) but it gives me a count.

Do I have to add some executable code onto the end of pattern $re?

Update: Here is a method which uses the code execution extended pattern (?{ ... }):

use re 'eval';  # perl complained otherwise
my $re = qr{\wat};
my $text = "a cat a hat the bat some fat for a rat";

my @x;
$text =~ s{ ($re)(?{ push(@x, $1)}) }{xxx}gx;

say "text = $text";
say Dumper(\@x); use Data::Dumper;
like image 492
ErikR Avatar asked Aug 13 '11 00:08

ErikR


People also ask

What is \s in Perl regex?

The substitution operator, s///, is really just an extension of the match operator that allows you to replace the text matched with some new text. The basic form of the operator is − s/PATTERN/REPLACEMENT/; The PATTERN is the regular expression for the text that we are looking for.

How do I match a pattern in Perl?

m operator in Perl is used to match a pattern within the given text. The string passed to m operator can be enclosed within any character which will be used as a delimiter to regular expressions.

How do I match a new line in a regular expression in Perl?

Use /m , /s , or both as pattern modifiers. /s lets . match newline (normally it doesn't). If the string had more than one line in it, then /foo. *bar/s could match a "foo" on one line and a "bar" on a following line.

What does \s+ mean in Perl?

(\S+) | will match and capture any number (one or more) of non-space characters, followed by a space character (assuming the regular expression isn't modified with a /x flag). In both cases, these constructs appear to be one component of an alternation.


2 Answers

If by "slick" you mean "employs uncommonly-used language features" or "will make other programmers scratch their heads," then maybe this is the solution for you:

my ($temp, @matches);

push @matches, \substr($text, $-[0], $+[0] - $-[0]) while $text =~ /\wat/g;

$temp = $$_, $$_ = 'xxx', $_ = $temp for reverse @matches;
like image 70
Sean Avatar answered Oct 02 '22 15:10

Sean


This is similar to the approach in your Update, but a bit easier to read:

$text =~ s/($re)/push @x, $1; 'xxx'/ge;

Or this way (probably slower):

push @x, $1 while $text =~ s/($re)/xxx/;

But, really, is there anything wrong with unslick?

my @x = $text =~ /($re)/g;
$text =~ s/($re)/xxx/g;
like image 27
FMc Avatar answered Oct 02 '22 16:10

FMc