Using Regex in .Net
I will have a set of data that comes in something like this
< Bunch o' Data Here >
where <
is just the indicator of a new record and >
is the end of the record.
these records may come in like this
< Dataset 1><Dataset 2 broken, no closing tag <dataset 3>
they could also come in as
< Dataset 1>Dataset 2 broken, no opening tag ><dataset 3>
although, i'm not certain that this latter case is possible, and i'll cross that bridge when i have to.
I'm trying to use Regex to split these into records based on this start and end character, ultimately something like this
Match 1 = < Dataset 1>
Match 2 = <Dataset 2 broken, no closing tag
Match 3 = <Dataset 3>
i'm trying to figure out how the non-capturing groups work and maybe my understanding is wrong.
<.*?(?:<|>)
gets me pretty close i think, except for that it includes the opening character of the 3rd set of data with the capture of the second group.
I also suspect that ?:
is not doing what it needs to and if it take it out, it returns the same set of matches(2).
It looks like you have it flipped. You'll want to use ?:
to not capture a group, not :?
.
<.*?(?:<|>)
To expand a bit: the ?
operator within a capture group signifies that you want to do something special. A :
means to not capture, but there are other operands that you can give the ?
in order to perform other actions. Common ones are look-ahead (?=
) and look-behind (?<
), but there are many others.
I also just realized the scope of what you're trying to match (beyond the non-capturing issue). The language of matched parens/brackets/etc is not regular, so - assuming I'm understanding your purpose correctly - you'd need to create a fairly elaborate extended regular expression in order to match what you want. There are a couple of other SO questions about this, including this one which has some discussion about it.
What about something simple like this: <[^<>]+>|[^<>]+>|<[^<>]+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With