Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the equivalent of branch reset operator ("?|") found in php(pcre) in C#?

The following regular expression will match "Saturday" or "Sunday" : (?:(Sat)ur|(Sun))day

But in one case backreference 1 is filled while backreference 2 is empty and in the other case vice-versa.

PHP (pcre) provides a nice operator "?|" that circumvents this problem. The previous regex would become (?|(Sat)ur|(Sun))day. So there will not be empty backreferences.

Is there an equivalent in C# or some workaround ?

like image 775
Stephan Avatar asked Mar 21 '11 12:03

Stephan


People also ask

What version of PHP do I need to install PCRE?

As this manual page says, you need PHP 5.1.0 and the /u modifier in order to enable these features, but that isn't the only requirement! It is possible to install later versions of PHP (we have 5.1.4) while linking to an older PCRE install.

When should I use a branch Reset Group?

In Perl, PCRE, and Boost, it is best to use a branch reset group when you want groups in different alternatives to have the same name. That’s the only way in Perl, PCRE, and Boost to make sure that groups with the same name really are one and the same group.

Why is the reset operation not represented in the flowchart?

The reset operation is not represented in the flowchart, because it is an interrupt, and therefore may occur at any time within the loop. The program name, MOT1, is placed in the start terminal symbol. Most programs need some form of initialization process, such as setting up the ports at the beginning of the main program loop.

What is a branch Reset Group in Perl?

Perl 5.10 introduced a new regular expression feature called a branch reset group. JGsoft V2 and PCRE 7.2 and later also support this, as do languages like PHP, Delphi, and R that have regex functions based on PCRE. Boost added them to its ECMAScript grammar in version 1.42. Alternatives inside a branch reset group share the same capturing groups.


2 Answers

should be possible to concat backref1 and backref2.
As one of each is always empty and a string concat with empty is still the same string...

with your regex (?:(Sat)ur|(Sun))day and replacement $1$2
you get Sat for Saturday and Sun for Sunday.

 regex (?:(Sat)ur|(Sun))day
 input    | backref1 _$1_ | backref2 _$2_ | 'concat' _$1$2_
 ---------|---------------|---------------|----------------
 Saturday | 'Sat'         | ''            | 'Sat'+'' = Sat
 Sunday   | ''            | 'Sun'         | ''+'Sun' = Sun

instead of reading backref1 or backref2 just read both results and concat the result.

like image 111
bw_üezi Avatar answered Sep 29 '22 18:09

bw_üezi


.NET doesn't support the branch-reset operator, but it does support named groups, and it lets you reuse group names without restriction (something no other flavor does, AFAIK). So you could use this:

(?:(?<abbr>Sat)ur|(?<abbr>Sun))day

...and the abbreviated name will be stored in Match.Groups["abbr"].

like image 32
Alan Moore Avatar answered Sep 29 '22 20:09

Alan Moore