Problem with whitespace in a RegEx with capture groups

Question

I've got a regular expression that I'm trying to match against the following types of data, with each token separated by an unknown number of spaces.

Update: "Text" can be almost any character, which is why I had .* initially. Importantly, it can also include spaces.

Text
Text 01
Text 01 of 03
Text 01 (of 03)
Text 01-03

I'd like to capture "Text", "01", and "03" as separate groups, and all except "Text" are optional. The best I've been able to do so far is:

\s*(.*)\s+(\d+)\s*(?:\s*$?\s*(?:of|-)\s*(\d+)\s*$?\s*)

This matches #3-#5, and puts them in the proper capture groups. I can't figure out, though, why when I add an additional ? to the end to make the part of the expression after 01 optional, my capture groups get all funky.

\s*(.*)\s+(\d+)\s*(?:\s*$?\s*(?:of|-)\s*(\d+)\s*$?\s*)?

The RegEx above matches #2-#5, but the capture groups are correct only for #2 and #5.

This seems like a straightforward regular expression, so I don't know why I'm having so much trouble with it.

This is a link to an online RegEx evaluator I'm using to help me debug this: http://regexr.com?2tb64. The link already has the first RegEx and the test data filled in.

ridgerunner · Accepted Answer

You didn't say which regex tool you are using so I am assuming the least common denominator i.e. Javascript. Here is one that works:

var re = /^\s*(.+?)(?:\s+(\d+)(?:(?:\s+$?of\s+|-)(\d+)$?)?)?$/i;

To make this work in your Regexr tool, be sure to turn on the "multi-line option".

Here it the same thing in PHP syntax (with lots of juicy comments!):

$re = '/ # Always write non-trivial regex in free-space mode!
    ^                  # Anchor to start of string.
    \s*                # optional leading whitspace is ok.
    (.+?)              # Text can be pretty much anything.
    (?:                # Group to allow applying ? quantifier
      \s+              # WS separates "Text" from first number.
      (\d+)            # First number.
      (?:              # Group to allow applying ? quantifier
        (?:            # Second number prefix alternatives
          \s+$?of\s+  # Either " of 03" and " (of 03)",
        | -            # or just a dash  for "-03" case.
        )              # End second number prefix alternatives
        (\d+)          # Second number
        $?            # Match ")" for " (of 03)" case.
      )?               # Second number is optional.
    )?                 # First numebr is optional.
    $                  # Anchor to start of string.
    /ix';

Joe · Answer

Try this:
http://regexr.com?2tb67

Regex looks something like:

(\w+?)\s+(\d*)[^\d]*(\d+)

Match all letters, followed by any white spaces, then match all digits, followed by anything that's not digits, then match remaining digits.

Note that the second result probably isn't ideal for you because 01 comes in the third group match. But it matches all your cases.

stema · Answer

Your Second one is close

So I reworked: regexr, matches now all in the correct groups.

\s*(\w*)\s+(?:\s*(\d+)\s*(?:\s*$?\s*(?:of|-)\s*(\d+)\s*$?)?)?

Problem with whitespace in a RegEx with capture groups

Tags:

regex

whitespace

capture-group

Dov

3 Answers

ridgerunner

Joe

stema

Recent Activity

Donate For Us

Problem with whitespace in a RegEx with capture groups

Tags:

regex

whitespace

capture-group

Dov

3 Answers

ridgerunner

Joe

stema

Related questions

Recent Activity

Donate For Us