Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx to capture repeating fields?

Tags:

c#

regex

I have a buffer I'm trying to parse with regular expressions.

Here's an example of the buffer:

DATA#ALPHAONE;BETATWO.CHARLIETHREE!

The format is: The buffer always starts with "DATA#", as a literal header. After that it will have one-or-more text-fields separated by either a semicolon, period or exclamation.

My Regex pattern (in C#) so far is:

string singleFieldPattern = "(?'Field'.*?)(?'Separator'[;.!])";
string fullBufferPattern = "(?'Header'DATA#)(" + singleFieldPattern + ")+";

The problem comes when I try to dump the data that matched:

Regex response = new Regex(fullBufferPattern);
string example = "DATA#ALPHAONE;BETATWO.CHARLIETHREE!";

Debug.WriteLine("RegEx Matches?: {0}", response.IsMatch(example));  
foreach (Match m in response.Matches(example))
{
    foreach(string s in new string[]{"Header", "Field", "Separator"}) 
    {
        Debug.WriteLine("{0} : {1}", s, m.Groups[s]);
    }
}

The only output is:

RegEx Matches?: True
Header : DATA#
Field : CHARLIETHREE
Separator : !

I intended the output to be:

RegEx Matches?: True
Header : DATA#
Field : ALPHAONE
Separator : ;
Field : BETATWO
Separator : .
Field : CHARLIETHREE
Separator : !

My expression did not get the earlier fields, ALPHAONE and BETATWO (and their Separators of ; and .) as I intended. It only captured the last field (CHARLIETHREE).

How can I get all the parts that matched singleFieldPattern?


I've simplified my data format above for question purposes, but since some people want the real-data, here is much closer to the actual data:

(Note: values in [ ] are single-bytes that are unprintable, and spaces are for clarity only.)

Example:

[SYN] % SYSNAMScanner[ACK]; BAUDRATE57600[ACK]; CTRLMODEXON[ACK];

Translation:
The System Name (SYSNAM) is "Scanner"
The baud rate is 57,600
The Flow Control is XON

like image 787
abelenky Avatar asked Feb 16 '23 22:02

abelenky


2 Answers

This bit of LINQ will pair together the fields and separators from your regex:

var ms = response.Matches(example);
foreach (Match m in ms)
{
    string header = m.Groups["Header"].Value;
    Debug.WriteLine("Header : " + header);
    var pairs = m.Groups["Field"].Captures.Cast<Capture>().Zip(
                    m.Groups["Separator"].Captures.Cast<Capture>(),
                    (f, s) => new { Field = f.Value, Separator = s.Value });
    foreach (var pair in pairs)
    {
        Debug.WriteLine(pair.ToString());
    }
}

This outputs:

Header : DATA#
{ Field = ALPHAONE, Separator = ; }
{ Field = BETATWO, Separator = . }
{ Field = CHARLIETHREE, Separator = ! }
like image 170
Tim S. Avatar answered Feb 23 '23 10:02

Tim S.


If you don't mind a bit of LINQ, you can do this:

string data = "DATA#ALPHAONE;BETATWO.CHARLIETHREE!";
var fullBufferPattern = @"(?<header>DATA#)(?<fields>.+)[;.!]";
var fieldPattern = @"(?<field>[^;.!]+)[;.!]?";

var fields = Regex.Matches(data, fullBufferPattern)
                    .OfType<Match>()
                    .SelectMany(
                        m =>
                        Regex.Matches(m.Groups["fields"].Value, fieldPattern)
                             .OfType<Match>())
                    .Select(m => m.Groups["field"].Value).ToArray();

The variable fields will have:

ALPHAONE    
BETATWO
CHARLIETHREE

Edit: To reproduce you Debug output, use:

string data = "DATA#ALPHAONE;BETATWO.CHARLIETHREE!";
var fullBufferPattern = @"(?<header>DATA#)(?<fields>([^;.!]+[;.!])+)";
var fieldPattern = @"(?<field>[^;.!]+)(?<separator>[;.!])";

var groups = Regex.Matches(data, fullBufferPattern)
                  .OfType<Match>()
                  .Select(
                      m =>
                      new
                      {
                          Header = m.Groups["header"],
                          Fields = Regex.Matches(m.Groups["fields"].Value, fieldPattern)
                                        .OfType<Match>()
                                        .Select(f => new
                                            {
                                                Field = f.Groups["field"],
                                                Separator = f.Groups["separator"]
                                            })
                      });

foreach (var element in groups)
{
    Debug.WriteLine("Header : {0}", element.Header);
    foreach (var field in element.Fields)
    {
        Debug.WriteLine("Field : {0}", field.Field);
        Debug.WriteLine("Separator : {0}", field.Separator);
    }
}

Output is:

Header : DATA#
Field : ALPHAONE
Separator : ;
Field : BETATWO
Separator : .
Field : CHARLIETHREE
Separator : !
like image 30
Simon Belanger Avatar answered Feb 23 '23 10:02

Simon Belanger