Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fixing RegEx Split() function - Empty string as first entry

Tags:

c#

.net

regex

split

upfront the code to visualize a bit the problem I am facing:

  1. This is the text that needs to be split.
:20:0444453880181732
:21:0444453880131350
:22:CANCEL/ABCDEF0131835055
:23:BUY/CALL/E/EUR
:82A:ABCDEFZZ80A
:87A:4444655604
:30:061123
:31G:070416/1000/USNY
:31E:070418
:26F:PRINCIPAL
:32B:EUR1000000,00
:36:1,31000000
:33B:USD1310000,00
:37K:PCT1,60000000
:34P:061127USD16000,00
:57A:ABCDEFZZ80A

This is my Regex

 Regex r = new Regex(@"\:\d{2}\w*\:", RegexOptions.Multiline);

 MatchCollection matches = r.Matches(Content);
 string[] items = r.Split(Content);

 // ----- Fix for first entry being empty string.
 int index = items[0] == string.Empty ? 1 : 0;

 foreach (Match match in matches)
 {
    MessageField field = new MessageField();

    field.FieldIdExtended = match.Value;
    field.Content = items[index];

    Fields.Add(field);

    index++;
 }

As you can see from the comments the problem occurs with the splitting of the string. It returns as first item an empty string. Is there any elegant way to solve this?

Thanks, Dimi

like image 528
Dimi Takis Avatar asked Nov 12 '22 23:11

Dimi Takis


1 Answers

The reason that you are getting this behaviour is that your first delimiter from the split has nothing before it and this the first entry is blank.

The way to solve this properly is probably to capture the value that you want in the regular expression and then just get it from your match set.

At a rough first guess you probably want something like:

Regex r = new Regex(@"^:(?<id>\d{2}\w*):(?<content>.*)$", RegexOptions.Multiline);

MatchCollection matches = r.Matches(Content);

foreach (Match match in matches)
{
    MessageField field = new MessageField();

    field.FieldIdExtended = match.Groups["id"].ToString()
    field.Content = match.Groups["content"].ToString();

    Fields.Add(field);

}

The use of named capture groups makes it easy to extract stuff. You may need to tweak the regex to be more as you want. Currently it gets 20 as id and 0444453880181732 as content. I wasn't 100% clear on what you needed to capture but you look ok with regex so I assume that isn't a problem. :)

Essentially here you are not really trying to split stuff but match stuff and pull it out.

like image 54
Chris Avatar answered Nov 15 '22 13:11

Chris