I've incrementally updated a regular expression to cleanup a data and found it runs infinity long for the string. It's a Python code to test.
import re
re.sub(r'(([-.]?\d+)|(\d+))+$', '', 'bcn16-01081-210300-16-20160829-ca')
I suppose it's because of duplicates of \d+ in groups because it became working if reg exp is simplified to ([-.]\d+|\d+)+$. But I want to know more precise. Anybody knows?
In (([-.]?\d+)|(\d+))+$, both [-.]?\d+ and \d+ are practically matching the same strings. When a + is applied on the right, the expression starts behaving in a similar way to the notorious (a+)*$ pattern.
Use
[-.]?\d+(?:[-.]\d+)*$
([-.]\d+|\d+)+$ is still catastrophic backtracking prone because of nested +. See more about catastrophic backtracking (and here).
Explanation
--------------------------------------------------------------------------------
[-.]? any character of: '-', '.' (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[-.] any character of: '-', '.'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With