Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting Text Between Brackets with Regex

Tags:

regex

excel

vba

In sentences like:

"[x] Alpha

[33] Beta"

I extract an array of bracketed data as ([x], [33])

using VBA regex Pattern:

"(\[x\])|(\[\d*\])"

I cannot extract directly the array of un-bracketed data as (x, 33)

using web resources advice for pattern

"(?<=\[)(.*?)(?=\])"

Is this a VBA specific problem (i.e. limits on its implementation of Regex) or did I misunderstand 'looking forward and backward' patterns?

Public Function Regx( _
  ByVal SourceString As String, _
  ByVal Pattern As String, _
  Optional ByVal IgnoreCase As Boolean = True, _
  Optional ByVal MultiLine As Boolean = True, _
  Optional ByVal MatchGlobal As Boolean = True) _
  As Variant

Dim oMatch As Match
Dim arrMatches
Dim lngCount As Long

' Initialize to an empty array
arrMatches = Array()
With New RegExp
    .MultiLine = MultiLine
    .IgnoreCase = IgnoreCase
    .Global = MatchGlobal
    .Pattern = Pattern
    For Each oMatch In .Execute(SourceString)
        ReDim Preserve arrMatches(lngCount)
        arrMatches(lngCount) = oMatch.Value
        lngCount = lngCount + 1
    Next
End With


Sub testabove()
    Call Regx("[x] Alpha" & Chr(13) & _
      "[33] Beta", "(\[x\])|(\[\d*\])")
End Sub
like image 307
Ziad El Hachem Avatar asked Mar 10 '23 19:03

Ziad El Hachem


2 Answers

Use capturing around the subpatterns that will fetch you your required value.

Use

"\[(x)\]|\[(\d*)\]"

(or \d+ if you need to match at least 1 digit, as * means zero or more occurrences, and + means one or more occurrences).

Or, use the generic pattern to extract anything inside the square brackets without the brackets:

"\[([^\][]+)]"

Then, access the right Submatches index by checking the submatch length (since you have an alternation, either of the submatch will be empty), and there you go. Just change your for loop with

For Each oMatch In .Execute(SourceString)
    ReDim Preserve arrMatches(lngCount)
    If Len(oMatch.SubMatches(0)) > 0 Then
        arrMatches(lngCount) = oMatch.SubMatches(0)
    Else
        arrMatches(lngCount) = oMatch.SubMatches(1)
    End If
    ' Debug.Print arrMatches(lngCount) ' - This outputs x and 33 with your data
    lngCount = lngCount + 1
Next
like image 166
Wiktor Stribiżew Avatar answered Mar 13 '23 09:03

Wiktor Stribiżew


With Excel and VBA you can strip the brackets after the regex extraction:

Sub qwerty()

    Dim inpt As String, outpt As String
    Dim MColl As MatchCollection, temp2 As String
    Dim regex As RegExp, L As Long

    inpt = "38c6v5hrk[x]537fhvvb"

    Set regex = New RegExp
    regex.Pattern = "(\[x\])|(\[\d*\])"
    Set MColl = regex.Execute(inpt)
    temp2 = MColl(0).Value

    L = Len(temp2) - 2
    outpt = Mid(temp2, 2, L)

    MsgBox inpt & vbCrLf & outpt
End Sub

enter image description here

like image 33
Gary's Student Avatar answered Mar 13 '23 09:03

Gary's Student