Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this RegEx work the way I want it to?

Tags:

regex

I have a RegEx that is working for me but I don't know WHY it is working for me. I'll explain.

RegEx: \s*<in.*="(<?.*?>)"\s*/>\s*


Text it finds (it finds the white-space before and after the input tag):

<td class="style9">
      <input name="guarantor4" id="guarantor4" size="50" type="text" tabindex="10" value="<?php echo $data[guarantor4]; ?>"  />    </td>
</tr>


The part I don't understand:

<in.*=" <--- As I understand it, this should only find up to the first =" as in it should only find <input name="

It actually finds: <input name="guarantor4" id="guarantor4" size="50" type="text" tabindex="10" value=" which happened to be what I was trying to do.

What am I not understanding about this RegEx?

like image 922
Haabda Avatar asked Nov 04 '08 18:11

Haabda


1 Answers

You appear to be using 'greedy' matching.

Greedy matching says "eat as much as possible to make this work"

try with

<in[^=]*=  

for starters, that will stop it matching the "=" as part of ".*"

but in future, you might want to read up on the

.*?  

and

.+?

notation, which stops at the first possible condtion that matches instead of the last.

The use of 'non-greedy' syntax would be better if you were trying to only stop when you saw TWO characters,

ie:

<in.*?=id

which would stop on the first '=id' regardless of whether or not there are '=' in between.

like image 70
Kent Fredric Avatar answered Oct 20 '22 21:10

Kent Fredric