Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is RegEx handled differently in VBA and JavaScript?

I'm using a regular expression in Excel VBA to parse the results of a swim meet. The code reads a row of text that was copied from a PDF and outputs the important data into individual cells. Since the format of the string varies throughout the source PDF, the regular expression is quite complicated. Still, I'm able to parse 95% of the data at this point.

Some of the rows that are not being parsed are confusing me, though. VBA is clearly not able to find a match with the regular expression, but when I copy the exact same regex and string into this website, JavaScript is able to find a match without a problem. Is there something different in the way VBA and JavaScript handle regular expressions that might account for this?

Here's the string that VBA refuses to match:

12. NUNEZ CHENG, Walter 74 Club Tennis Las Terr 3:44.57 123

Here's the function I'm using in Excel (mostly successfully):

Function singleLineResults(SourceString As String) As Variant
    Dim cSubmatches As Variant
    Dim collectionArray(11) As String
    Dim cnt As Integer
    Dim oMatches As MatchCollection

    With New RegExp
        .MultiLine = MultiLine
        .IgnoreCase = IgnoreCase
        .Global = False

        '1. JAROSOVA, Lenka 27 Swimmpower Prague 2:26.65 605 34.45 37.70 37.79 36.71
        .Pattern = "(\d*)\.?\s?([^,]+),\s([^\d]+)\s?(\d+)\s((?:[A-Z]{3})?)\s?((?:(?!\d\:\d).)*)\s?((?:\d+:)?\d+\.\d+)(?:\s(\d+))?(?:\s((?:\d+:)?\d+.\d+))?(?:\s((?:\d+:)?\d+.\d+))?(?:\s((?:\d+:)?\d+.\d+))?(?:\s((?:\d+:)?\d+.\d+))?(?:Splash Meet Manager 11, Build \d{5} Registered to [\w\s]+ 2014-08-\d+ \d+:\d+ - Page \d+)?$"

        Set oMatches = .Execute(SourceString)
        If oMatches.Count > 0 Then
            For Each submatch In oMatches(0).SubMatches
                collectionArray(cnt) = submatch '.Value
                cnt = cnt + 1
            Next
        Else
            singleLineResults = Null
        End If
    End With

    singleLineResults = collectionArray()
End Function
like image 412
carpiediem Avatar asked Aug 22 '14 02:08

carpiediem


People also ask

Does VBA support regular expressions?

RegEx stands for “Regular Expression” in VBA Excel and is a sequence of characters that defines the search pattern for finding a specific pattern of characters in a string of values. So, in a simple word, we can create a regular expression pattern and use it to search for the string of that pattern.

Does RegEx work in JavaScript?

In JavaScript, you can write RegExp patterns using simple patterns, special characters, and flags. In this section, we'll explore the different ways to write regular expressions while focusing on simple patterns, special characters, and flags.

How do I write RegEx in VBA?

How to start using RegEx. To start using RegEx in VBA, you need to select and activate the RegEx object reference library from the references list. Start by going to the VBA IDE, select Tools –> References, then select Microsoft VBScript Regular Expressions 5.5, then click OK.


1 Answers

Could you add more examples to what actually matches? E.g. the surrounding lines that matches, and better yet, examples that are not supposed to match if any?

I've tried "cleaning" up a bit in the regex, removing groups that are not used to match that particular line, to make the error more obvious, and changed how one of the groups works, which might actually fix the issue:

(\d*)
\.?\s?
([^,]+)
,\s
([^\d]+)
\s?
(\d+)
\s
(
  (?:[A-Z]{3})?
)
\s?
(
# OLD SOLUTION
#  (?:
#    (?!\d\:\d)
#    .
#  )*

# NEW SOLUTION
  .*?
)
\s?
(
  (?:\d+:)?
  \d+\.\d+
)
(?:
  \s
  (\d+)
)?
$

See example on regex101.

The group that puzzles me the most, however, is this one:

(?:[A-Z]{3})?

Why the 3 character limit, when it only matches the first 3 letters in the street name?

like image 54
Johny Skovdal Avatar answered Oct 06 '22 01:10

Johny Skovdal