This is a follow-up of this question (not asked by me though). Trying to answer, I ran into a couple of problems.
Consider the string strings123[abc789<span>123</span>def<span>456</span>000]strings456, how would one match the digits in square brackets that are not surrounded by span tags in Python (using the newer regex module) ?
In the example string, this would be 789 and 000.
\G like (demo)
(?:\G(?!\A)|\[)
[^\d\]]*
\K
\d+
and (*SKIP)(*FAIL) (demo):
<span>.*?</span>(*SKIP)(*FAIL)
|
\d+
But was unable to combine both statements:
<span>.*?</span>(*SKIP)(*FAIL)
|
(?:
(?:\G(?!\A)|\[)
[^\d\]]*
(\d+)
[^\d\]]*
\K
)
How can this be done?
One of the things I like about PyPi regex module is that it supports infinite-width lookbehind:
- Variable-length lookbehind
A lookbehind can match a variable-length string.
>>> import regex
>>> s = 'strings123[abc789<span>123</span>def<span>456</span>000]strings456'
>>> rx = r'(?<=\[[^][]*)(?:<span>[^<]*</span>(*SKIP)(?!)|\d+)(?=[^][]*])'
>>> regex.findall(rx, s)
['789', '000']
>>>
Pattern details:
(?<=\[[^][]*) - there must be a [ followed with zero or more chars other than ] and [ immediately to the left of the current location(?: - a non-capturing group start
<span>[^<]*</span>(*SKIP)(?!) - match a <span>, then 0+ chars other than < (with a [^<]* negated character class), and then a </span> and discard the match while staying at the match end position, and go on to look for the next match| - or\d+ - 1+ digits(?=[^][]*]) - there must be a ] after zero or more chars other than ] and [ immediately to the right of the current location.I thought of an algorithm which is as follows.
Search for square brackets and contents within it and store result in a variable. Regex would be \[[^]]*\].
Now search for <span> tags and replace it with - just for simplicity of next step. Regex would be (<span>.*?</span>).
Now you will be left with contents of square brackets other than what was in <span> tags. Simply search with \d+ to match digits.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With