Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why order matters in this RegEx with alternation?

Requirements for a TextBox control were to accept the following as valid inputs:

  1. A sequence of numbers.
  2. Literal string 'Number of rooms'.
  3. No value at all (left blank). Not specifying a value at all should allow for the RegularExpressionValidator to pass.

Following RegEx yielded the desired results (successfully validated the 3 types of inputs):

"Number of rooms|[0-9]*"

However, I couldn't come up with an explanation when a colleague asked why the following fails to validate when the string 'Number of rooms' is specified (requirement #2):

"[0-9]*|Number of rooms"

An explanation as to why the ordering of alternation matters in this case would be very insightful indeed.

UPDATE:

The second regex successfully matches the target string "Number of rooms" in console app as shown here. However, using the identical expression in aspx markup doesn't match when the input is "Number of rooms". Here's the relevant aspx markup:

<asp:TextBox runat="server" ID="textbox1" >
</asp:TextBox>

<asp:RegularExpressionValidator ID="RegularExpressionValidator1" 
EnableClientScript="false" runat="server" ControlToValidate="textbox1" 
ValidationExpression="[0-9]*|Number of rooms" 
ErrorMessage="RegularExpressionValidator"></asp:RegularExpressionValidator>

<asp:Button ID="Button1" runat="server" Text="Button" />
like image 739
Abhinav Avatar asked Apr 20 '12 15:04

Abhinav


2 Answers

The order matters since that is the order which the Regex engine will try to match.

Case 1: Number of rooms|[0-9]*

In this case the regex engine will first try to match the text "Number of room". If this fails will then try to match numbers or nothing.

Case 2: [0-9]*|Number of rooms:

In this case the engine will first try to match number or nothing. But nothing will always match. In this case it never needs to try "Number of rooms"

This is kind of like the || operator in C#. Once the left side matches the right side is ignored.

Update: To answer your second question. It behaves differently with the RegularExpressionValidator because that is doing more than just checking for a match.

// .....
Match m = Regex.Match(controlValue, ValidationExpression);
return(m.Success && m.Index == 0 && m.Length == controlValue.Length); 
// .....

It is checking for a match as well as making sure the length of the match is the whole string. This rules out partial or empty matches.

like image 116
Matthew Manela Avatar answered Oct 21 '22 09:10

Matthew Manela


The point is that the [0-9]* at the beginning is matching empty strings if you specify that first.
If you specify that the whole string should be digits, then it should work:

^[0-9]*$|Number of rooms

Unless you specify ^ and $, to indicate that the whole string must be a match, an empty string will be matched at the beginning of "Number of rooms", and at that point the second alternative will not be tried out.
I hope this answers your question in the comment, I'm not sure if it's clear...

like image 35
Paolo Tedesco Avatar answered Oct 21 '22 08:10

Paolo Tedesco