Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why ^*$ matches "127.0.0.1"

Tags:

c#

regex

I don't understand, why does the following regular expression:

^*$

Match the string "127.0.0.1"? Using Regex.IsMatch("127.0.0.1", "^*$");

Using Expresso, it does not match, which is also what I would expect. Using the expression ^.*$ does match the string, which I would also expect.

Technically, ^*$ should match the beginning of a string/line any number of times, followed by the ending of the string/line. It seems * is implicitly treated as a .*

What am I missing?

EDIT: Run the following to see an example of the problem.

using System;
using System.Text.RegularExpressions;

namespace RegexFubar
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(Regex.IsMatch("127.0.0.1", "^*$"));
            Console.Read();
        }
    }
}

I do not wish to have ^*$ match my string, I am wondering why it does match it. I would think that the expression should result in an exception being thrown, or at least a non-match.

EDIT2: To clear up any confusion. I did not write this regex with the intention of having it match "127.0.0.1". A user of our application entered the expression and wondered why it matched the string when it should not. After looking at it, I could not come up with an explanation for why it matched - especially not since Expresso and .NET seems to handle it differently.

I guess the question is answered by it being due to the .NET implementation avoiding throwing an exception, even thought it's technically an incorrect expression. But is this really what we want?

like image 500
Mark S. Rasmussen Avatar asked Oct 21 '08 11:10

Mark S. Rasmussen


People also ask

For what purpose is 127.0 0.1 used for?

0.1, the IP address of the local computer. This IP address allows the machine to connect to and communicate with itself. Therefore, localhost (127.0. 0.1) is used to establish an IP connection to the same device used by the end-user.

What is the difference between 127.0 0.1 and :: 1?

On modern computer systems, localhost as a hostname translates to an IPv4 address in the 127.0. 0.0/8 (loopback) net block, usually 127.0. 0.1, or ::1 in IPv6. The only difference is that it would be looking up in the DNS for the system what localhost resolves to.

Are localhost and 127.0 0.1 the same?

Localhost is often considered synonymous with the IP address 127.0. 0.1.

Is :: 1 a valid loopback address?

The most commonly used IP address on the loopback network is 127.0. 0.1 for IPv4 and ::1 for IPv6.


2 Answers

Well, theoretically you are right, it should not match. But this depends on how the implementation works internally. Most regex impl. will take your regex and strip ^ from the front (taking note that it must match from start of the string) and strip $ from the end (noting that it must to the end of the string), what is left over is just "*" and "*" on its own is a valid regex. The implementation you are using is just wrong regarding how to handle it. You could try what happens if you replace "^*$" just with "*"; I guess it will also match everything. It seems like the implementation treats a single asterisk like a ".*".

According to ISO/IEC 9945-2:1993 standard, which is also described in the POSIX standard, it is broken. It is broken because the standard says that after a ^ character, an asterisk has no special meaning at all. That means "^*$" should actually only match a single string and this string is "*"!

To quote the standard:

The asterisk is special except when used:

  • in a bracket expression
  • as the first character of an entire BRE (after an initial ^, if any)
  • as the first character of a subexpression (after an initial ^, if any); see BREs Matching Multiple Characters .

So if it is the first character (and ^ doesn't count as first character if present) it has no special meaning. That means in this case an asterisk should only match one character and that is an asterisk.


Update

Microsoft says

Microsoft .NET Framework regular expressions incorporate the most popular features of other regular expression implementations such as those in Perl and awk. Designed to be compatible with Perl 5 regular expressions, .NET Framework regular expressions include features not yet seen in other implementations, such as right-to-left matching and on-the-fly compilation.

Source: http://msdn.microsoft.com/en-us/library/hs600312.aspx

Okay, let's test this:

# echo -n 127.0.0.1 | perl -n -e 'print (($_ =~ m/(^.*$)/)[0]),"\n";'
-> 127.0.0.1
# echo -n 127.0.0.1 | perl -n -e 'print (($_ =~ m/(^*$)/)[0]),"\n";'
->

Nope, it does not. Perl works correctly. ^.*$ matches the string, ^*$ doesn't => .NET's regex implementation is broken and it does not work like Perl 5 as MS claims.

like image 56
Mecki Avatar answered Oct 27 '22 01:10

Mecki


Asterisk (*) matches the preceding element ZERO OR MORE times. If you want one or more, use the + operator instead of the *.

You are asking it to match an optional start of string marker and the end of string marker. I.e. if we omit the start of string marker, you're only looking for the end of string marker... which will match any string!

I don't really understand what you are trying to do. If you could give us more information then maybe I could tell you what you should have done :)

like image 44
Jon Grant Avatar answered Oct 26 '22 23:10

Jon Grant