I have been trying to create a regular expressions pattern that matches any reference in any Excel formula, including absolute, relative, and external references. I need to return the entire reference, including the worksheet and workbook name.
I haven't been able to find exhaustive documentation about Excel A1-notation, but with a lot of testing I have determined the following:
=A1&"A1"
would break regex=Sheet1!A1
=[Book1.xlsx]Sheet1!A1
'C:\[Book1.xlsx]Sheet1'!A1
='[Book 1.xlsx]Sheet 1'!A1
Range.Formula
still returns references in A1-notation. Range.FormulaR1C1
returns references in R1C1 notation.
=SUM([Book5]Sheet1:Sheet3!A1)
Here is what I came up with wrapped in a VBA procedure for testing. I updated the code to handle names as well:
Sub ReturnFormulaReferences()
Dim objRegExp As New VBScript_RegExp_55.RegExp
Dim objCell As Range
Dim objStringMatches As Object
Dim objReferenceMatches As Object
Dim objMatch As Object
Dim intReferenceCount As Integer
Dim intIndex As Integer
Dim booIsReference As Boolean
Dim objName As Name
Dim booNameFound As Boolean
With objRegExp
.MultiLine = True
.Global = True
.IgnoreCase = True
End With
For Each objCell In Selection.Cells
If Left(objCell.Formula, 1) = "=" Then
objRegExp.Pattern = "\"".*\"""
Set objStringMatches = objRegExp.Execute(objCell.Formula)
objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
& "|[a-z]{1,3}\:[a-z]{1,3}" _
& "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
& "|[0-9]{1,7}\:[0-9]{1,7}" _
& "|[a-z_\\][a-z0-9_\.]{0,254})"
Set objReferenceMatches = objRegExp.Execute(objCell.Formula)
intReferenceCount = 0
For Each objMatch In objReferenceMatches
intReferenceCount = intReferenceCount + 1
Next
Debug.Print objCell.Formula
For intIndex = intReferenceCount - 1 To 0 Step -1
booIsReference = True
For Each objMatch In objStringMatches
If objReferenceMatches(intIndex).FirstIndex > objMatch.FirstIndex _
And objReferenceMatches(intIndex).FirstIndex < objMatch.FirstIndex + objMatch.Length Then
booIsReference = False
Exit For
End If
Next
If booIsReference Then
objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
& "|[a-z]{1,3}\:[a-z]{1,3}" _
& "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
& "|[0-9]{1,7}\:[0-9]{1,7})"
If Not objRegExp.Test(objReferenceMatches(intIndex).Value) Then 'reference is not A1
objRegExp.Pattern = "^(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)" _
& "[a-z_\\][a-z0-9_\.]{0,254}$"
If Not objRegExp.Test(objReferenceMatches(intIndex).Value) Then 'name is not external
booNameFound = False
For Each objName In objCell.Worksheet.Parent.Names
If objReferenceMatches(intIndex).Value = objName.Name Then
booNameFound = True
Exit For
End If
Next
If Not booNameFound Then
objRegExp.Pattern = "^(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)"
For Each objName In objCell.Worksheet.Names
If objReferenceMatches(intIndex).Value = objRegExp.Replace(objName.Name, "") Then
booNameFound = True
Exit For
End If
Next
End If
booIsReference = booNameFound
End If
End If
End If
If booIsReference Then
Debug.Print " " & objReferenceMatches(intIndex).Value _
& " (" & objReferenceMatches(intIndex).FirstIndex & ", " _
& objReferenceMatches(intIndex).Length & ")"
End If
Next intIndex
Debug.Print
End If
Next
Set objRegExp = Nothing
Set objStringMatches = Nothing
Set objReferenceMatches = Nothing
Set objMatch = Nothing
Set objCell = Nothing
Set objName = Nothing
End Sub
Can anyone break or improve this? Without exhaustive documentation on Excel's formula syntax it is difficult to know if this is correct.
Thanks!
Does Excel support regex? Regrettably, there are no inbuilt Regex functions in Excel. To be able to use regular expressions in your formulas, you'll have to create your own user-defined function (VBA or . NET based) or install third-party tools supporting regexes.
In regular expressions, we can match any character using period "." character. To match multiple characters or a given set of characters, we should use character classes.
A regular expression (sometimes called a rational expression) is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations.
The MATCH function searches for a specified item in a range of cells, and then returns the relative position of that item in the range. For example, if the range A1:A3 contains the values 5, 25, and 38, then the formula =MATCH(25,A1:A3,0) returns the number 2, because 25 is the second item in the range.
jtolle steered me in the right direction. As far as I can tell, this is what I was trying to do. I've been testing and it seems to work.
stringOriginFormula = rangeOrigin.Formula
rangeOrigin.Cut rangeDestination
rangeOrigin.Formula = stringOriginFormula
Thanks jtolle!
I'm a few years late here, but I was looking for something similar and so dug into this. The main pattern you use is this:
objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
& "|[a-z]{1,3}\:[a-z]{1,3}" _
& "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
& "|[0-9]{1,7}\:[0-9]{1,7}" _
& "|[a-z_\\][a-z0-9_\.]{0,254})"
Basically you have six alternatives for a range reference (lines 3-8), any of which will produce a match by itself, with two alternatives for an optional filename/sheet name prefix (lines 1-2).
For the two prefix alternatives, the only difference is that the first is wrapped in single quotes, with an extra dot star after the initial quote. These single quotes occur mainly when there is a space in a sheet name. The purpose of the dot star, matching unconstrained text after an initial single quote, is unclear and it appears to create problems. I'll discuss those problems below. Besides that the two alternative prefixes are the same, and I'll refer to them collectively as the Optional External Prefix (OEP).
The OEP has its own two optional prefixes (the same in either alternative). The first is for the workbook name, an open-ended dot star in brackets.
(\[.*\])?
The second is for a "3D" cell reference, with two sheet names separated by a colon; it is the initial sheet name including the colon. The pattern here is a negated character class allowing up to 31 characters of anything except forward slash, back slash, question mark, asterisk, brackets, or colon, followed by a colon:
([^\:\\\/\?\*\[\]]{1,31}\:)?
Finally for the OEP is its only required part: a sheet name, same as the optional sheet name but with no colon. The effect is (if these all worked correctly) that the required sheet name will match if it can, and then only if there is a 3d reference or additional prior bracketed text will its optional prefixes also match.
Issues with the Workbook/Sheet name prefix: First, the dot star at the beginning of the first line is over-inclusive. Similarly, the negated character class for the sheet name appears to need additional characters including parens, comma, plus, minus, equals, and bang. Otherwise, extra material is interpreted as part of the sheet name. On my testing, this overinclusion happened with any of these:
=SUM(Sheet1!A1,Sheet2!A2)
=Sheet1!A1+Sheet2!A2
=Sheet1!A1-Sheet2!A2
Sheet names can include some of these characters, so accounting for that would require some additional measure. For instance, a sheet could be named "(Sheet1)", giving an odd formula like:
=SUM('(Sheet1)'!A1:A2)
You'd like to get the inner parens with the sheet name there, but not the outer paren. Excel puts the single quotes on that one, as it would with a space in the sheet name. You could then exclude parens in the non-single quote version since within the single quote it's ok. But then beware Excel seems to even allow single quotes in sheet names. Taking these naming quirks to the extreme, I just successfully named a sheet "Hi'Sheet1'SUM('Sheet2'!A1,A2)!". That's absurd but it points to what could happen. I learned in doing this that if I include a single quote in a sheet name, formulas escape the single quote with a second single quote. So a SUM(A1:A2) referring to the sheet I just created ends up looking like this:
=SUM('Hi''Sheet1''SUM(''Sheet2''!A1,A2)!'!A1:A2)
That actually does give some insight into the Excel parser itself. I suspect to adequately deal with this you may want separately (outside the regex) to compare the potential sheet names or workbook names to the actual sheet names, as you have done with the named ranges.
This leads to the six forms of cell references allowed in the regex (any one of which, if met, will produce a match):
"(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?"
The open paren here is closed at the end of the 6 options. Otherwise, this line allows a basic cell reference of the type "$A$1", "A1", "$A1", "A$1", or any combination of these in a multi-cell range ("$A1:A$2", etc.).
"|\$[a-z]{1,3}\:\$[a-z]{1,3}"
This one allows a cell reference of the type "$A:$B" with a dollar sign on both. Note a dollar sign on only one side will not match.
"|[a-z]{1,3}\:[a-z]{1,3}"
This line is like the last, but matches only with no dollar signs. Note a dollar sign on only one side will not match here either.
"|\$[0-9]{1,7}\:\$[0-9]{1,7}"
This line allows a cell reference of the type "$1:$2" with a dollar sign on both.
"|[0-9]{1,7}\:[0-9]{1,7}"
This version is like the last, but matches only with no dollar signs.
"|[a-z_\\][a-z0-9_\.]{0,254})"
Finally, the sixth option allows text. This text is compared to actual named ranges later in sub.
The main omission that I see here is ranges that have both absolute and relative references, of the type "A:$A" or "1:$1". While $A:A is captured because it includes "A:A", "A:$A" is not captured. You could address this and simplify the regex by combining 2 and 3 and combining 4 and 5 with optional dollar signs:
objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$?[a-z]{1,3}\:\$?[a-z]{1,3}" _
& "|\$?[0-9]{1,7}\:\$?[0-9]{1,7}" _
& "|[a-z_\\][a-z0-9_\.]{0,254})"
Combining these further would seem to come up against the everything-is-optional problem.
One other issue is in the initial regex pattern for matching strings, which you use to expunge potential ranges that fall inside a quoted string: objRegExp.Pattern = "\"".*\""" When I test this on a formula with a string at the beginning and end of a formula, the greediness of the dot star captures everything from the initial quote to the final quote (in other words it interprets the entire formula as one big quoted string, even though there is non-string material in the middle). It appears you can fix this by making the dot star lazy (adding a question mark after it). That raises questions about quotes within quotes, but they may not be a problem. For instance, I tested this formula:
="John loves his A1 steak sauce, but said the ""good A1 steak sauce price"" is $" & A2+A3 & " less than the ""bad price"" of $" & A4 & "."
With cell values plugged in, this formula evaluates to:
John loves his A1 steak sauce, but said the "good A1 steak sauce price" is $5 less than the "bad price" of $8.
With the lazy modifier added to your string pattern, both versions of "A1" above were recognized as occurring within a string and so were expunged, while A2, A3 and A4 were recognized as cell references.
I'm sure there are some technical issues with some of my language above, but hopefully the analysis is still useful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With