Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"eager" regexp matching

Tags:

regex

perl

I have to remove the string between two delimiters, i.e From "123XabcX321" I want "123321". For a simple case, I'm fine with:

$_=<>;
s/X(.*)X//;
print;

But if there's ambiguity in the input like "123XabcXasdfjXasdX321", it matches the first X with the last X and I get "123321" but I want "123asdfj321". Is there a way to specify an "eager" match that matches with the first valid possible delimiter and not the last?

like image 611
GClaramunt Avatar asked Mar 28 '11 02:03

GClaramunt


2 Answers

It's normally called "ungreedy", you put a ? after the quantifier: s/X(.*?)X//;

like image 190
Anomie Avatar answered Sep 22 '22 10:09

Anomie


Avoid the non-greedy modifier as anything but a performance hint if you can. Using it can lead to "unexpected" results because adding ? doesn't actually prevent .* from matching anything. For example,

$ perl -le'print for "XaXbXY" =~ /X(.*?)XY/;'
aXb

To avoid matching X, you can use the following:

s/X[^X]*X//g;

If X is really something larger than one character, you can use the following:

s/X(?:(?!X).)*X//g;
like image 7
ikegami Avatar answered Sep 22 '22 10:09

ikegami