Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .
The re. groups() method This method returns a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern.
Non-capturing groups are important constructs within Java Regular Expressions. They create a sub-pattern that functions as a single unit but does not save the matched character sequence. In this tutorial, we'll explore how to use non-capturing groups in Java Regular Expressions.
To get access to the text matched by each regex group, pass the group's number to the group(group_number) method. So the first group will be a group of 1. The second group will be a group of 2 and so on. So this is the simple way to access each of the groups as long as the patterns were matched.
(you can skip to What if... if you get bored with intros)
This question is not directed to VBScript particularly (I just used it in this case): I want to find a solution for general regular expressions usage (editors included).
This started when I wanted to create an adaptation of Example 4 where 3 capture groups are used to split data across 3 cells in MS Excel. I needed to capture one entire pattern and then, within it, capture 3 other patterns. However, in the same expression, I also needed to capture another kind of pattern and again capture 3 other patterns within it (yeah I know... but before pointing the nutjob finger, please finish reading).
I thought first of Named Capturing Groups then I realized that I should not «mix named and numbered capturing groups» since it «is not recommended because flavors are inconsistent in how the groups are numbered».
Then I looked into VBScript SubMatches and «non-capturing» groups and I got a working solution for a specific case:
For Each C In Myrange
strPattern = "(?:^([0-9]+);([0-9]+);([0-9]+)$|^.*:([0-9]+)\s.*:([0-9]+).*:([a-zA-Z0-9]+)$)"
If strPattern <> "" Then
strInput = C.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
Set rgxMatches = regEx.Execute(strInput)
For Each mtx In rgxMatches
If mtx.SubMatches(0) <> "" Then
C.Offset(0, 1) = mtx.SubMatches(0)
C.Offset(0, 2) = mtx.SubMatches(1)
C.Offset(0, 3) = mtx.SubMatches(2)
ElseIf mtx.SubMatches(3) <> "" Then
C.Offset(0, 1) = mtx.SubMatches(3)
C.Offset(0, 2) = mtx.SubMatches(4)
C.Offset(0, 3) = mtx.SubMatches(5)
Else
C.Offset(0, 1) = "(Not matched)"
End If
Next
End If
Next
Here's a demo in Rubular of the regex. In these:
124;12;3
my id1:213 my id2:232 my word:ins4yanrgx
:8587459 :18254182540215 :dcpt
0;1;2
It returns the first 2 cells with numbers and the 3rd with a number or a word. Basically I used a non-capturing group with 2 "parent" patterns ("parents" = broad patterns where I want to detect other sub-patterns). If the 1st parent pattern has a matching sub-pattern (1st capture group) then I place its value and the remaining captured groups of this pattern in the 3 cells. If not, I check if the 4th capture group (belonging to the 2nd parent pattern) was matched and place the remaining sub-patterns in the same 3 cells.
Instead of having something like this:
(?:^(\d+);(\d+);(\d+)$|^.*:(\d+)\s.*:(\d+).*:(\w+)$|what(ever))
Something like this could be possible:
(#:^(\d+);(\d+);(\d+)$)|(#:^.*:(\d+)\s.*:(\d+).*:(\w+)$)|(#:what(ever))
Where (#:
instead of creating a non-capturing group, would create a "parent" numbered capture group.
In this way I could do something similar to Example 4:
C.Offset(0, 1) = regEx.Replace(strInput, "#$1")
C.Offset(0, 2) = regEx.Replace(strInput, "#$2")
C.Offset(0, 3) = regEx.Replace(strInput, "#$3")
It would search parent patterns until it finds a match in a child pattern (the first match would be returned and, ideally, wouldn't search the remaining ones).
Is there something like this already? Or am I missing something entirely from regex that allows to do this?
Other possible variations:
#2$3
(this would be equivalent of $6
in my example);(#:^_(?:(#:(\d+):\w+-(\d))|(#:\w+:(\d+)-(\d+)))_$)|(#:^\w+:\s+(#:(\w+);\d-(\d+))$)
and fetching ##$1
in patterns like:
_123:smt-4_
it would match in: 123_ott:432-10_
it would match in: 432yant: special;3-45235
it would match in: special
Please tell me if you noticed any mistakes or flaws in this logic, I will edit asap.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With