Task:
My goal is to find all numbered lines in procedures of my Code Modules. The CodeModule.Find method can be used to check for search terms (target parameter).
Syntax:
object.Find(target, startline, startcol, endline, endcol [, wholeword] [, matchcase] [, patternsearch])
The referring help site https://msdn.microsoft.com/en-us/library/aa443952(v=vs.60).aspx states: parameter patternsearch: Optional. A Boolean value specifying whether or not the target string is a regular expression pattern. If True, the target string is a regular expression pattern. False is the default.
As explained above the find method allows a regex pattern search, which I would like to use in order to identify numbered lines in a precise way: digits followed by a tab. The example below therefore defines a search string s and sets the last parameter PatternSearch in the .Find method to True.
Problem AFAIK a valid regex definition could be
s = "[0-9]{1,4}[ \t]"
but that doesn't show anything, not even an error.
In order to show at least any results, I defined the search term
s = "[0-9]*[ \t]*)"
in the calling example procedure ListNumberedLines showing erratic results.
Question
Is there any possibility to use a valid regex patternsearch in the CodeModule.Find method?
Example code
Option Explicit
' ==============
' Example Search
' ==============
Sub ListNumberedLines()
' Declare search pattern string s
Dim S As String
10 S = "[0-9]*[ \t]*)"
20 Debug.Print "Search Term: " & S
30 Call findWordInModules(S)
End Sub
Public Sub findWordInModules(ByVal sSearchTerm As String)
' Purpose: find modules ('components') with lines containing a search term
' Method: .CodeModule.Find with last parameter patternsearch set to True
' Based on https://www.devhut.net/2016/02/24/vba-find-term-in-vba-modulescode/
' VBComponent requires reference to Microsoft Visual Basic for Applications Extensibility
' or keep it as is and use Late Binding instead
' Declare module variable oComponent
Dim oComponent As Object 'VBComponent
For Each oComponent In Application.VBE.ActiveVBProject.VBComponents
If oComponent.CodeModule.Find(sSearchTerm, 1, 1, -1, -1, False, False, True) = True Then
Debug.Print "Module: " & oComponent.Name 'Name of the current module in which the term was found (at least once)
'Need to execute a recursive listing of where it is found in the module since it could be found more than once
Call listLinesinModuleWhereFound(oComponent, sSearchTerm)
End If
Next oComponent
End Sub
Sub listLinesinModuleWhereFound(ByVal oComponent As Object, ByVal sSearchTerm As String)
' Purpose: list module lines containing a search term
' Method: .CodeModule.Find with last parameter patternsearch set to True
Dim lTotalNoLines As Long 'total number of lines within the module being examined
Dim lLineNo As Long 'will return the line no where the term is found
lLineNo = 1
With oComponent ' Module
lTotalNoLines = .CodeModule.CountOfLines
Do While .CodeModule.Find(sSearchTerm, lLineNo, 1, -1, -1, False, False, True) = True
Debug.Print vbTab & "Zl. " & lLineNo & "|" & _
Trim(.CodeModule.Lines(lLineNo, 1)) 'Remove any padding spaces
lLineNo = lLineNo + 1 'Restart the search at the next line looking for the next occurence
Loop
End With
End Sub
As @MatsMug says, parsing VBA with Regex is hard impossible, but line-numbers are a simpler case, and should be findable with regex alone.
Fortunately, line numbers can only appear within a procedure body (including before the End Sub/Function/Property
statement), so we know they'll never be the first line of your code.
Unfortunately, you can prefix a line-label with 0 or more line continuations:
Sub Foo()
_
_
10 Beep
End Sub
Furthermore, a line number isn't always followed by a space - it can be followed by an instruction separator, giving the line-number the appearance of a line-label:
Sub foo()
10: Beep
End Sub
And if you're code is evil, you might encounter a negative line-number (entered by using hex notation - which VBE dutifully pretty prints back to the code-pane with a leading space and a negative number):
Sub foo()
10 Beep
-1 Beep
End Sub
And we also need to be able to identify numbers that appear on a continued line, that aren't line-numbers:
Sub foo()
Debug.Print _
5 & "is not a line-number"
End Sub
So, here's some evil line-numbering, with a mix of all of those edge-cases:
Option Explicit
Sub foo()
5: Beep
_
_
_
10 Beep
20 _
'Debug.Print _
30
50: Beep
40 Beep
_
-1 _
Beep 'The "-1" line number is achieved by entering "&HFFFFFFFF"
Debug.Print _
2 & "is not a line-number"
60 End Sub
And here's some regex that identifies the line-numbers:
(?<! _)\n( _\n)* ?(?<line_number>(?:\-)?\d+)[: ]
And here's a syntax highlight from regex101:
For the longest time, Rubberduck was struggling with properly/formally parsing line numbers - our work-around was to remove them (replacing them with spaces) before feeding the code module contents to our parser.
Recently we've managed to formally define line numbers:
// lineNumberLabel should actually be "statement-label" according to MS VBAL but they only allow lineNumberLabels:
// A <statement-label> that occurs as the first element of a <list-or-label> element has the effect
// as if the <statement-label> was replaced with a <goto-statement> containing the same
// <statement-label>. This <goto-statement> takes the place of <line-number-label> in
// <statement-list>.
listOrLabel :
lineNumberLabel (whiteSpace? COLON whiteSpace? sameLineStatement?)*
| (COLON whiteSpace?)? sameLineStatement (whiteSpace? COLON whiteSpace? sameLineStatement?)*
;
sameLineStatement : blockStmt;
And lineNumberLabel
is defined as:
//Statement labels can only appear at the start of a line.
statementLabelDefinition : {_input.La(-1) == NEWLINE}? (combinedLabels | identifierStatementLabel | standaloneLineNumberLabel);
identifierStatementLabel : unrestrictedIdentifier whiteSpace? COLON;
standaloneLineNumberLabel :
lineNumberLabel whiteSpace? COLON
| lineNumberLabel;
combinedLabels : lineNumberLabel whiteSpace identifierStatementLabel;
lineNumberLabel : numberLiteral;
(full Antlr4 grammar here)
Notice the predicate {_input.La(-1) == NEWLINE}?
, which force the parser rule to only match a statementLabelDefinition
at the start of a line - a logical line of code.
You see VBA code has physical code lines, like what you're getting from the CodeModule
's contents. But VBA code also has a concept of logical code lines, and it turns out that is all the parser cares about.
This would trip any typical regex:
Sub DoSomething()
Debug.Print _
42
End Sub
There's only 1 logical line of code between the signature and the End Sub
token, but a simple Find
will happily consider that 42
as a "line number" ...which it isn't - it's the argument passed to Debug.Print
, in the same instruction, on the same logical code line, but on the next physical code line.
And you can't be dealing with logical code lines without first pre-processing your input, to take line continuation tokens into account.
And in order to do that, you need to actually parse the instructions you're seeing - at least know where they start and where they end... and that's no small undertaking! see ThunderFrame's answer
The VBIDE API is extremely limited, and won't be helpful for that.
TL;DR: You can't parse VBA code with regular expressions alone. So, nope. Sorry! you need a much more complex regex pattern than that - see ThunderFrame's answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With