Why regular expression is stuck?

Question

I've incrementally updated a regular expression to cleanup a data and found it runs infinity long for the string. It's a Python code to test.

import re

re.sub(r'(([-.]?\d+)|(\d+))+$', '', 'bcn16-01081-210300-16-20160829-ca')

I suppose it's because of duplicates of \d+ in groups because it became working if reg exp is simplified to ([-.]\d+|\d+)+$. But I want to know more precise. Anybody knows?

Ryszard Czech · Accepted Answer

In (([-.]?\d+)|(\d+))+$, both [-.]?\d+ and \d+ are practically matching the same strings. When a + is applied on the right, the expression starts behaving in a similar way to the notorious (a+)*$ pattern.

Use

[-.]?\d+(?:[-.]\d+)*$

([-.]\d+|\d+)+$ is still catastrophic backtracking prone because of nested +. See more about catastrophic backtracking (and here).

Explanation

--------------------------------------------------------------------------------
  [-.]?                    any character of: '-', '.' (optional
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  \d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    [-.]                     any character of: '-', '.'
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  $                        before an optional 
, and the end of the
                           string

Why regular expression is stuck?

Tags:

python

regex

regex-group

python-re

stasdavydov

1 Answers

Ryszard Czech

Recent Activity

Donate For Us

Why regular expression is stuck?

Tags:

python

regex

regex-group

python-re

stasdavydov

1 Answers

Ryszard Czech

Related questions

Recent Activity

Donate For Us