I have a buffer I'm trying to parse with regular expressions.
Here's an example of the buffer:
DATA#ALPHAONE;BETATWO.CHARLIETHREE!
The format is: The buffer always starts with "DATA#", as a literal header. After that it will have one-or-more text-fields separated by either a semicolon, period or exclamation.
My Regex pattern (in C#) so far is:
string singleFieldPattern = "(?'Field'.*?)(?'Separator'[;.!])";
string fullBufferPattern = "(?'Header'DATA#)(" + singleFieldPattern + ")+";
The problem comes when I try to dump the data that matched:
Regex response = new Regex(fullBufferPattern);
string example = "DATA#ALPHAONE;BETATWO.CHARLIETHREE!";
Debug.WriteLine("RegEx Matches?: {0}", response.IsMatch(example));
foreach (Match m in response.Matches(example))
{
foreach(string s in new string[]{"Header", "Field", "Separator"})
{
Debug.WriteLine("{0} : {1}", s, m.Groups[s]);
}
}
The only output is:
RegEx Matches?: True
Header : DATA#
Field : CHARLIETHREE
Separator : !
I intended the output to be:
RegEx Matches?: True
Header : DATA#
Field : ALPHAONE
Separator : ;
Field : BETATWO
Separator : .
Field : CHARLIETHREE
Separator : !
My expression did not get the earlier fields, ALPHAONE
and BETATWO
(and their Separators of ;
and .
) as I intended. It only captured the last field (CHARLIETHREE
).
How can I get all the parts that matched singleFieldPattern
?
(Note: values in [ ] are single-bytes that are unprintable, and spaces are for clarity only.)
Example:
[SYN] % SYSNAMScanner[ACK]; BAUDRATE57600[ACK]; CTRLMODEXON[ACK];
Translation:
The System Name (SYSNAM) is "Scanner"
The baud rate is 57,600
The Flow Control is XON
This bit of LINQ will pair together the fields and separators from your regex:
var ms = response.Matches(example);
foreach (Match m in ms)
{
string header = m.Groups["Header"].Value;
Debug.WriteLine("Header : " + header);
var pairs = m.Groups["Field"].Captures.Cast<Capture>().Zip(
m.Groups["Separator"].Captures.Cast<Capture>(),
(f, s) => new { Field = f.Value, Separator = s.Value });
foreach (var pair in pairs)
{
Debug.WriteLine(pair.ToString());
}
}
This outputs:
Header : DATA#
{ Field = ALPHAONE, Separator = ; }
{ Field = BETATWO, Separator = . }
{ Field = CHARLIETHREE, Separator = ! }
If you don't mind a bit of LINQ, you can do this:
string data = "DATA#ALPHAONE;BETATWO.CHARLIETHREE!";
var fullBufferPattern = @"(?<header>DATA#)(?<fields>.+)[;.!]";
var fieldPattern = @"(?<field>[^;.!]+)[;.!]?";
var fields = Regex.Matches(data, fullBufferPattern)
.OfType<Match>()
.SelectMany(
m =>
Regex.Matches(m.Groups["fields"].Value, fieldPattern)
.OfType<Match>())
.Select(m => m.Groups["field"].Value).ToArray();
The variable fields
will have:
ALPHAONE
BETATWO
CHARLIETHREE
Edit: To reproduce you Debug
output, use:
string data = "DATA#ALPHAONE;BETATWO.CHARLIETHREE!";
var fullBufferPattern = @"(?<header>DATA#)(?<fields>([^;.!]+[;.!])+)";
var fieldPattern = @"(?<field>[^;.!]+)(?<separator>[;.!])";
var groups = Regex.Matches(data, fullBufferPattern)
.OfType<Match>()
.Select(
m =>
new
{
Header = m.Groups["header"],
Fields = Regex.Matches(m.Groups["fields"].Value, fieldPattern)
.OfType<Match>()
.Select(f => new
{
Field = f.Groups["field"],
Separator = f.Groups["separator"]
})
});
foreach (var element in groups)
{
Debug.WriteLine("Header : {0}", element.Header);
foreach (var field in element.Fields)
{
Debug.WriteLine("Field : {0}", field.Field);
Debug.WriteLine("Separator : {0}", field.Separator);
}
}
Output is:
Header : DATA#
Field : ALPHAONE
Separator : ;
Field : BETATWO
Separator : .
Field : CHARLIETHREE
Separator : !
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With