Why does this RegEx work the way I want it to?

Question

I have a RegEx that is working for me but I don't know WHY it is working for me. I'll explain.

RegEx: \s*<in.*="(<?.*?>)"\s*/>\s*

Text it finds (it finds the white-space before and after the input tag):

<td class="style9">
      <input name="guarantor4" id="guarantor4" size="50" type="text" tabindex="10" value="<?php echo $data[guarantor4]; ?>"  />    </td>
</tr>

The part I don't understand:

<in.*=" <--- As I understand it, this should only find up to the first =" as in it should only find <input name="

It actually finds: <input name="guarantor4" id="guarantor4" size="50" type="text" tabindex="10" value=" which happened to be what I was trying to do.

What am I not understanding about this RegEx?

Kent Fredric · Accepted Answer

You appear to be using 'greedy' matching.

Greedy matching says "eat as much as possible to make this work"

try with

<in[^=]*=

for starters, that will stop it matching the "=" as part of ".*"

but in future, you might want to read up on the

.*?

and

.+?

notation, which stops at the first possible condtion that matches instead of the last.

The use of 'non-greedy' syntax would be better if you were trying to only stop when you saw TWO characters,

ie:

<in.*?=id

which would stop on the first '=id' regardless of whether or not there are '=' in between.

Why does this RegEx work the way I want it to?

Tags:

regex

Haabda

1 Answers

Kent Fredric

Recent Activity

Donate For Us

Why does this RegEx work the way I want it to?

Tags:

regex

Haabda

1 Answers

Kent Fredric

Related questions

Recent Activity

Donate For Us