I'm looking to create a VBA regular expression that will find the existence of two particular strings inside a set of parentheses.
For example, in this expression:
(aaa, bbb, ccc, ddd, xxx aaa)
it should somehow tell me that it found both "aaa" AND "xxx aaa" in the expression. I.e, since there is a match on "aaa" without the "xxxx " in front, and there is also a match on "xxx aaa" later on in the expression, it should return true. Since these two sequences can appear in either order, the reverse should also be true.
So I'm thinking the expression/s would be something like this:
"(xxx aaa"[^x][^x][^x][^x]aaa)"
to find the words in one order and
"(aaa"[^x][^x][^x][^x]xxx aaa)"
for the words in another order.
Does this make sense? Or is there a better approach?
I know this is changing the spec, but there is one important addendum - there cannot be any interceding parentheses between the terms.
So for example, this should't match:
(aaa, bbb, ccc, ddd, (eee, xxx aaa))
In other words I'm trying to look in between a matching set of parentheses only.
Zero-width look-ahead asserttions are your friend.
Function FindInParen(str As String, term1 As String, term2 As String) As Boolean
Dim re As New VBScript_RegExp_55.RegExp
re.Pattern = "\(" & _
"(?=[^()]*)\)" & _
"(?=[^()]*\b" & RegexEscape(term1) & "\b)" & _
"(?=[^()]*\b" & RegexEscape(term2) & "\b)"
FindInParen = re.Test(str)
End Function
Function RegexEscape(str As String) As String
With New VBScript_RegExp_55.RegExp
.Pattern = "[.+*?^$|\[\](){}\\]"
.Global = True
RegexEscape = .Replace(str, "\$&")
End With
End Function
This pattern reads as:
term1
occurs before the closing parenterm2
occurs before the closing parenSince I'm using look-ahead ((?=...)
), the regex engine never actually moves forward on the string, so I can chain as many look-ahead assertions and have them all checked. A side-effect is that the order in which term1
and term2
occur in the string doesn't matter.
I tested it on the console ("Immediate Window"):
? FindInParen("(aaa, bbb, ccc, ddd, xxx aaa)", "aaa", "xxx aaa")
True
? FindInParen("(aaa, bbb, ccc, ddd, (eee, xxx aaa))", "aaa", "xxx aaa")
True
? FindInParen("(aaa, bbb, ccc, ddd, (eee, xxx aaa))", "bbb", "xxx aaa")
False
Notes:
True
because—technically—both aaa
and xxx aaa
are inside the same set of parens.FWIW, here's a minimal nesting-aware function that works for the second test case above:
Function FindInParen(str As String, term1 As String, term2 As String) As Boolean
Dim parenPair As New VBScript_RegExp_55.RegExp
Dim terms As New VBScript_RegExp_55.RegExp
Dim matches As VBScript_RegExp_55.MatchCollection
FindInParen = False
parenPair.Pattern = "\([^()]*\)"
terms.Pattern = "(?=.*?[(,]\s*(?=\b" & RegexEscape(Trim(term1)) & "\b))" & _
"(?=.*?[(,]\s*(?=\b" & RegexEscape(Trim(term2)) & "\b))"
Do
Set matches = parenPair.Execute(str)
If matches.Count Then
If terms.Test(matches(0).Value) Then
Debug.Print "found here: " & matches(0).Value
FindInParen = True
End If
str = parenPair.Replace(str, "[...]")
End If
Loop Until FindInParen Or matches.Count = 0
If Not FindInParen Then
Debug.Print "not found"
End If
If InStr("(", str) > 0 Or InStr(")", str) > 0 Then
Debug.Print "mis-matched parens"
End If
End Function
Console:
? FindInParen("(aaa, bbb, ccc, ddd, (eee, xxx aaa))", "aaa", "xxx aaa")
not found
False
? FindInParen("(aaa, bbb, ccc, ddd, (eee, xxx aaa))", "eee", "xxx aaa")
found here: (eee, xxx aaa)
True
It's not really clear from your question exactly what you want (and maybe Regexp is not really needed here) but this might be close:
Sub Tester()
RegexpTest ("(aaa, bbb, ccc, ddd, xxx aaa)")
End Sub
Sub RegexpTest(txt As String)
Dim re As Object
Dim allMatches, m
Set re = CreateObject("VBScript.RegExp")
re.Pattern = "([^,\(]*aaa)"
re.ignorecase = True
re.Global = True
Set allMatches = re.Execute(txt)
For Each m In allMatches
Debug.Print Trim(m)
Next m
End Sub
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With