Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Regex.Split - Subpattern returns empty strings

Tags:

c#

regex

split

Hey, first time poster on this awesome community.

I have a regular expression in my C# application to parse an assignment of a variable:

NewVar = 40

which is entered in a Textbox. I want my regular expression to return (using Regex.Split) the name of the variable and the value, pretty straightforward. This is the Regex I have so far:

var r = new Regex(@"^(\w+)=(\d+)$", RegexOptions.IgnorePatternWhitespace);
var mc = r.Split(command);

My goal was to do the trimming of whitespace in the Regex and not use the Trim() method of the returned values. Currently, it works but it returns an empty string at the beginning of the MatchCollection and an empty string at the end.

Using the above input example, this is what's returned from Regex.Split:

mc[0] = ""
mc[1] = "NewVar"
mc[2] = "40"
mc[3] = ""

So my question is: why does it return an empty string at the beginning and the end?

Thanks.

like image 990
ademers Avatar asked Nov 14 '09 18:11

ademers


2 Answers

The reson RegEx.Split is returning four values is that you have exactly one match, so RegEx.Split is returning:

  • All the text before your match, which is ""
  • All () groups within your match, which are "NewVar" and "40"
  • All the text after your match, which is ""

RegEx.Split's primary purpose is to extract any text between the matched regex, for example you could use RegEx.Split with a pattern of "[,;]" to split text on either commas or semicolons. In NET Framework 1.0 and 1.1, Regex.Split only returned the split values, in this case "" and "", but in NET Framework 2.0 it was modified to also include values matched by () within the Regex, which is why you are seeing "NewVar" and "40" at all.

What you were looking for is Regex.Match, not Regex.Split. It will do exactly what you want:

var r = new Regex(@"^(\w+)=(\d+)$");
var match = r.Match(command);
var varName = match.Groups[0].Value;
var valueText = match.Groups[1].Value;

Note that RegexOptions.IgnorePatternWhitespace means you can include extra spaces in your pattern - it has nothing to do with the matched text. Since you have no extra whitespace in your pattern it is unnecesssary.

like image 154
Ray Burns Avatar answered Oct 23 '22 11:10

Ray Burns


From the docs, Regex.Split() uses the regular expression as the delimiter to split on. It does not split the captured groups out of the input string. Also, the IgnorePatternWhitespace ignore unescaped whitespace in your pattern, not the input.

Instead, try the following:

var r = new Regex(@"\s*=\s*");
var mc = r.Split(command);

Note that the whitespace is actually consumed as a part of the delimiter.

like image 1
jheddings Avatar answered Oct 23 '22 12:10

jheddings