Regex: capturing groups within capture groups

Intro

(you can skip to What if... if you get bored with intros)

This question is not directed to VBScript particularly (I just used it in this case): I want to find a solution for general regular expressions usage (editors included).

This started when I wanted to create an adaptation of Example 4 where 3 capture groups are used to split data across 3 cells in MS Excel. I needed to capture one entire pattern and then, within it, capture 3 other patterns. However, in the same expression, I also needed to capture another kind of pattern and again capture 3 other patterns within it (yeah I know... but before pointing the nutjob finger, please finish reading).

I thought first of Named Capturing Groups then I realized that I should not «mix named and numbered capturing groups» since it «is not recommended because flavors are inconsistent in how the groups are numbered».

Then I looked into VBScript SubMatches and «non-capturing» groups and I got a working solution for a specific case:

For Each C In Myrange
    strPattern = "(?:^([0-9]+);([0-9]+);([0-9]+)$|^.*:([0-9]+)\s.*:([0-9]+).*:([a-zA-Z0-9]+)$)"

    If strPattern <> "" Then
        strInput = C.Value

        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        Set rgxMatches = regEx.Execute(strInput)

        For Each mtx In rgxMatches
            If mtx.SubMatches(0) <> "" Then
                C.Offset(0, 1) = mtx.SubMatches(0)
                C.Offset(0, 2) = mtx.SubMatches(1)
                C.Offset(0, 3) = mtx.SubMatches(2)
            ElseIf mtx.SubMatches(3) <> "" Then
                C.Offset(0, 1) = mtx.SubMatches(3)
                C.Offset(0, 2) = mtx.SubMatches(4)
                C.Offset(0, 3) = mtx.SubMatches(5)
            Else
                C.Offset(0, 1) = "(Not matched)"
            End If
        Next
    End If
Next

Here's a demo in Rubular of the regex. In these:

124;12;3
my id1:213 my id2:232 my word:ins4yanrgx
:8587459 :18254182540215 :dcpt
0;1;2

It returns the first 2 cells with numbers and the 3^rd with a number or a word. Basically I used a non-capturing group with 2 "parent" patterns ("parents" = broad patterns where I want to detect other sub-patterns). If the 1^st parent pattern has a matching sub-pattern (1^st capture group) then I place its value and the remaining captured groups of this pattern in the 3 cells. If not, I check if the 4^th capture group (belonging to the 2^nd parent pattern) was matched and place the remaining sub-patterns in the same 3 cells.

What if...

Instead of having something like this:

(?:^(\d+);(\d+);(\d+)$|^.*:(\d+)\s.*:(\d+).*:(\w+)$|what(ever))

Something like this could be possible:

(#:^(\d+);(\d+);(\d+)$)|(#:^.*:(\d+)\s.*:(\d+).*:(\w+)$)|(#:what(ever))

Where (#: instead of creating a non-capturing group, would create a "parent" numbered capture group. In this way I could do something similar to Example 4:

C.Offset(0, 1) = regEx.Replace(strInput, "#$1")
C.Offset(0, 2) = regEx.Replace(strInput, "#$2")
C.Offset(0, 3) = regEx.Replace(strInput, "#$3")

It would search parent patterns until it finds a match in a child pattern (the first match would be returned and, ideally, wouldn't search the remaining ones).

Is there something like this already? Or am I missing something entirely from regex that allows to do this?

Other possible variations:

refer to the parent and child pattern directly, e.g.: #2$3 (this would be equivalent of $6 in my example);
create as many capturing groups as necessary within others (I guess it would be more complex but also the most interesting part as well), e.g.: with regex (same syntax) like (#:^_(?:(#:(\d+):\w+-(\d))|(#:\w+:(\d+)-(\d+)))_$)|(#:^\w+:\s+(#:(\w+);\d-(\d+))$) and fetching ##$1 in patterns like:

_123:smt-4_ it would match in: 123
_ott:432-10_ it would match in: 432
yant: special;3-45235 it would match in: special

Please tell me if you noticed any mistakes or flaws in this logic, I will edit asap.

Related questions
                            
                                Will {0, 0} initialize array in the struct?
                            
                                dependency (for sun.security.util) of SBT build
                            
                                Optimize nested if statements within a loop in C/C++ with GCC
                            
                                Why does big negative margin-right make absolute element no wrap?
                            
                                Scala quasiquote concatenation
                            
                                Implementing Google Tag Manager into Android
                            
                                number of parameters in Caffe LENET or Imagenet models
                            
                                Using optional complex type action parameter for a POST Web API
                            
                                Adding a day/night cycle for scrolling clouds with smooth color transitions
                            
                                Is .pyc platform independent?
                            
                                Better errors message if template is missing
                            
                                Cordova/Phonegap iOS Parse-Push Plugin

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regex: capturing groups within capture groups

Tags:

People also ask

Intro

What if...

Recent Activity

Donate For Us