Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

VBA regex - match an expression that doesn't begin with a particular word

Tags:

regex

vba

I'm looking to create a VBA regular expression that will find the existence of two particular strings inside a set of parentheses.

For example, in this expression:

(aaa, bbb, ccc, ddd, xxx aaa)

it should somehow tell me that it found both "aaa" AND "xxx aaa" in the expression. I.e, since there is a match on "aaa" without the "xxxx " in front, and there is also a match on "xxx aaa" later on in the expression, it should return true. Since these two sequences can appear in either order, the reverse should also be true.

So I'm thinking the expression/s would be something like this:

"(xxx aaa"[^x][^x][^x][^x]aaa)"

to find the words in one order and

"(aaa"[^x][^x][^x][^x]xxx aaa)"

for the words in another order.

Does this make sense? Or is there a better approach?

I know this is changing the spec, but there is one important addendum - there cannot be any interceding parentheses between the terms.

So for example, this should't match:

(aaa, bbb, ccc, ddd, (eee, xxx aaa))

In other words I'm trying to look in between a matching set of parentheses only.

like image 718
Jack BeNimble Avatar asked Nov 03 '22 19:11

Jack BeNimble


2 Answers

Zero-width look-ahead asserttions are your friend.

Function FindInParen(str As String, term1 As String, term2 As String) As Boolean
  Dim re As New VBScript_RegExp_55.RegExp

  re.Pattern = "\(" & _
               "(?=[^()]*)\)" & _
               "(?=[^()]*\b" & RegexEscape(term1) & "\b)" & _
               "(?=[^()]*\b" & RegexEscape(term2) & "\b)"

  FindInParen = re.Test(str)
End Function

Function RegexEscape(str As String) As String
  With New VBScript_RegExp_55.RegExp
    .Pattern = "[.+*?^$|\[\](){}\\]"
    .Global = True
    RegexEscape = .Replace(str, "\$&")
  End With
End Function

This pattern reads as:

  • Starting from an opening paren, check:
    • that a matching closing paren follows somewhere and no nested parens inside
    • that term1 occurs before the closing paren
    • that term2 occurs before the closing paren

Since I'm using look-ahead ((?=...)), the regex engine never actually moves forward on the string, so I can chain as many look-ahead assertions and have them all checked. A side-effect is that the order in which term1 and term2 occur in the string doesn't matter.

I tested it on the console ("Immediate Window"):

? FindInParen("(aaa, bbb, ccc, ddd, xxx aaa)", "aaa", "xxx aaa")
True

? FindInParen("(aaa, bbb, ccc, ddd, (eee, xxx aaa))", "aaa", "xxx aaa")
True

? FindInParen("(aaa, bbb, ccc, ddd, (eee, xxx aaa))", "bbb", "xxx aaa")
False

Notes:

  • The second test yields True because—technically—both aaa and xxx aaa are inside the same set of parens.
  • Regex cannot deal with nested structures. You will never get nested parentheses right with regular expressions. You will never be able to find "a matching set of parens" with regex alone - only an opening/closing pair that has no other parens in-between. Write a parser if you need to handle nesting.
  • Make a reference to "Microsoft VBScript Regular Expressions 5.5" in your project.

FWIW, here's a minimal nesting-aware function that works for the second test case above:

Function FindInParen(str As String, term1 As String, term2 As String) As Boolean
  Dim parenPair As New VBScript_RegExp_55.RegExp
  Dim terms As New VBScript_RegExp_55.RegExp
  Dim matches As VBScript_RegExp_55.MatchCollection

  FindInParen = False
  parenPair.Pattern = "\([^()]*\)"
  terms.Pattern = "(?=.*?[(,]\s*(?=\b" & RegexEscape(Trim(term1)) & "\b))" & _
                  "(?=.*?[(,]\s*(?=\b" & RegexEscape(Trim(term2)) & "\b))"

  Do
    Set matches = parenPair.Execute(str)
    If matches.Count Then
      If terms.Test(matches(0).Value) Then
        Debug.Print "found here: " & matches(0).Value
        FindInParen = True
      End If
      str = parenPair.Replace(str, "[...]")
    End If
  Loop Until FindInParen Or matches.Count = 0

  If Not FindInParen Then
    Debug.Print "not found"
  End If

  If InStr("(", str) > 0 Or InStr(")", str) > 0 Then
    Debug.Print "mis-matched parens"
  End If
End Function

Console:

? FindInParen("(aaa, bbb, ccc, ddd, (eee, xxx aaa))", "aaa", "xxx aaa")
not found
False

? FindInParen("(aaa, bbb, ccc, ddd, (eee, xxx aaa))", "eee", "xxx aaa")
found here: (eee, xxx aaa)
True
like image 187
Tomalak Avatar answered Nov 15 '22 06:11

Tomalak


It's not really clear from your question exactly what you want (and maybe Regexp is not really needed here) but this might be close:

Sub Tester()
    RegexpTest ("(aaa, bbb, ccc, ddd, xxx aaa)")
End Sub


Sub RegexpTest(txt As String)
    Dim re As Object
    Dim allMatches, m

    Set re = CreateObject("VBScript.RegExp")
    re.Pattern = "([^,\(]*aaa)"
    re.ignorecase = True
    re.Global = True

    Set allMatches = re.Execute(txt)

    For Each m In allMatches
        Debug.Print Trim(m)
    Next m

End Sub
like image 36
Tim Williams Avatar answered Nov 15 '22 05:11

Tim Williams