Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex.Match, startat and ^ (start of string)

Tags:

c#

regex

Does some knows why the output of this code:

Regex re = new Regex("^bar", RegexOptions.Compiled);
string fooBarString = @"foo bar";

Match match1 = re.Match(fooBarString, 4);
Console.WriteLine(String.Format("Match 1 sucess: {0}", match1.Success));

Match match2 = re.Match(fooBarString.Substring(4));
Console.WriteLine(String.Format("Match 2 sucess: {0}", match2.Success));

is:

Match 1 sucess: False

Match 2 sucess: True

?

Expected behaviour is of course "True" and "True" (or else I really don't know what the "startat" parameter is supposed to be useful for).

The idea is that this regex matching (and there are lots of them) is called very often (several tousand per second) and we discovered that the substring operations are killing memory performance.

Thanks for your help!

like image 366
Vincent de Lagabbe Avatar asked May 04 '11 14:05

Vincent de Lagabbe


People also ask

How do you search for a regex pattern at the beginning of a string?

The meta character “^” matches the beginning of a particular string i.e. it matches the first character of the string. For example, The expression “^\d” matches the string/line starting with a digit. The expression “^[a-z]” matches the string/line starting with a lower case alphabet.

Which regex special characters match the start of a string?

^ the caret is the anchor for the start of the string, or the negation symbol. Example: "^a" matches "a" at the start of the string. Example: "[^0-9]" matches any non digit.

What matches the start of the string?

They are called “anchors”. The caret ^ matches at the beginning of the text, and the dollar $ – at the end. The pattern ^Mary means: “string start and then Mary”.

How do you start and end a regular expression?

The correct regex to use is ^\d+$. Because “start of string” must be matched before the match of \d+, and “end of string” must be matched right after it, the entire string must consist of digits for ^\d+$ to be able to match.


1 Answers

According to MSDN

If you want to restrict a match so that it begins at a particular character position in the string and the regular expression engine does not scan the remainder of the string for a match, anchor the regular expression with a \G (at the left for a left-to-right pattern, or at the right for a right-to-left pattern). This restricts the match so it must start exactly at startat.

The regexp is matched with the entire string, you will need to use \G instead of ^

http://msdn.microsoft.com/en-us/library/3583dcyh.aspx

like image 93
Maxime Brugidou Avatar answered Oct 21 '22 17:10

Maxime Brugidou