Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex exclude match from capture

Tags:

c#

regex

Using Regex in .Net

I will have a set of data that comes in something like this

< Bunch o' Data Here >

where < is just the indicator of a new record and > is the end of the record.

these records may come in like this

< Dataset 1><Dataset 2 broken, no closing tag <dataset 3>

they could also come in as

< Dataset 1>Dataset 2 broken, no opening tag ><dataset 3>

although, i'm not certain that this latter case is possible, and i'll cross that bridge when i have to.

I'm trying to use Regex to split these into records based on this start and end character, ultimately something like this

Match 1 = < Dataset 1>
Match 2 = <Dataset 2 broken, no closing tag 
Match 3 = <Dataset 3>

i'm trying to figure out how the non-capturing groups work and maybe my understanding is wrong.

<.*?(?:<|>)

gets me pretty close i think, except for that it includes the opening character of the 3rd set of data with the capture of the second group. I also suspect that ?: is not doing what it needs to and if it take it out, it returns the same set of matches(2).

like image 953
Beta033 Avatar asked Dec 23 '22 00:12

Beta033


2 Answers

It looks like you have it flipped. You'll want to use ?: to not capture a group, not :?.

 <.*?(?:<|>)

To expand a bit: the ? operator within a capture group signifies that you want to do something special. A : means to not capture, but there are other operands that you can give the ? in order to perform other actions. Common ones are look-ahead (?=) and look-behind (?<), but there are many others.

I also just realized the scope of what you're trying to match (beyond the non-capturing issue). The language of matched parens/brackets/etc is not regular, so - assuming I'm understanding your purpose correctly - you'd need to create a fairly elaborate extended regular expression in order to match what you want. There are a couple of other SO questions about this, including this one which has some discussion about it.

like image 69
eldarerathis Avatar answered Feb 11 '23 14:02

eldarerathis


What about something simple like this: <[^<>]+>|[^<>]+>|<[^<>]+

like image 25
Dan at Demand Avatar answered Feb 11 '23 14:02

Dan at Demand