Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why regular expression is stuck?

I've incrementally updated a regular expression to cleanup a data and found it runs infinity long for the string. It's a Python code to test.

import re

re.sub(r'(([-.]?\d+)|(\d+))+$', '', 'bcn16-01081-210300-16-20160829-ca')

I suppose it's because of duplicates of \d+ in groups because it became working if reg exp is simplified to ([-.]\d+|\d+)+$. But I want to know more precise. Anybody knows?

like image 563
stasdavydov Avatar asked Feb 16 '26 20:02

stasdavydov


1 Answers

In (([-.]?\d+)|(\d+))+$, both [-.]?\d+ and \d+ are practically matching the same strings. When a + is applied on the right, the expression starts behaving in a similar way to the notorious (a+)*$ pattern.

Use

[-.]?\d+(?:[-.]\d+)*$

([-.]\d+|\d+)+$ is still catastrophic backtracking prone because of nested +. See more about catastrophic backtracking (and here).

Explanation

--------------------------------------------------------------------------------
  [-.]?                    any character of: '-', '.' (optional
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  \d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    [-.]                     any character of: '-', '.'
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
like image 126
Ryszard Czech Avatar answered Feb 18 '26 09:02

Ryszard Czech