I have a regex to strip the end off a request url:
re.sub('(?:^\/en\/category).*(-\d{1,4}$)', '', r)
My problem is that the docs say it will replace the matched part, however when it matches my string it replaces the whole string, e.g.:
/en/category/specials/men-2610
I'm not sure what Python is doing, but my regex seems fine
EDIT: I wish to have the string with the end stripped off, target =
/en/category/specials/men
As stated in the docs, the matched part is replaced. Matched is different from captured.
You will have to capture the text you don't want to remove in a capture group like so:
(^/en/category.*)-\d{1,4}$
and put it back into the string using the backreference \1
:
re.sub(r'(^/en/category.*)-\d{1,4}$', r'\1', text)
(?<=^\/en\/category)(.*)-\d{1,4}$
Try this.replace by \1
.See demo.
https://regex101.com/r/tX2bH4/27
Your whole pattern matches that is why it is replacing the whole string.
P.S match
is different than captures or groups
.
import re
p = re.compile(r'(?<=^\/en\/category)(.*)-\d{1,4}$', re.IGNORECASE)
test_str = "/en/category/specials/men-2610"
subst = "\1"
result = re.sub(p, subst, test_str)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With