Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does my non-greedy Perl regex still match too much?

Say, I have a line that contains the following string:

"$tom" said blah blah blash.  "$dick" said "blah blah blah". "$harry" said blah blah blah.

and I want to extract

"$dick" said "blah blah blah"

I have the following code:

my ($term) = /(".+?" said ".+?")/g;
print $term;

But it gives me more than I need:

"$tom" said blah blah blash.  "$dick" said "blah blah blah"

I tried grouping my pattern as a whole by using the non-capturing parens:

my ($term) = /((?:".+?" said ".+?"))/g;

But the problem persists.

I've reread the Nongreedy Quantifiers section of Learning Perl but it's got me nowhere so far.

Thanks for any guidance you can generously offer :)

like image 488
Mike Avatar asked Oct 21 '09 05:10

Mike


1 Answers

The problem is that, even though it's not greedy, it still keeps trying. The regex doesn't see

"$tom" said blah blah blash.

and think "Oh, the stuff following the "said" isn't quoted, so I'll skip that one." It thinks "well, the stuff after "said" isn't quoted, so it must still be part of our quote." So ".+?" matches

"$tom" said blah blah blash.  "$dick"

What you want is "[^"]+". This will match two quote marks enclosing anything that's not a quote mark. So the final solution:

("[^"]+" said "[^"]+")
like image 116
Chris Lutz Avatar answered Oct 05 '22 06:10

Chris Lutz