I'm trying to capture the urls before a particular word. The only trouble is that the word could also be part of the domain.
Examples: (i'm trying to capture everything before dinner)
https://breakfast.example.com/lunch/dinner/ https://breakfast.example.brunch.com:8080/lunch/dinner http://dinnerdemo.example.com/dinner/
I am able to use:
^(.*://.*/)(?=dinner/?)
The trouble I am having is the lookahead doesn't appear to by lazy enough So the following is failing:
https://breakfast.example.com/lunch/dinner/login.html?returnURL=https://breakfast.example.com/lunch/dinner/
as it captures:
https://breakfast.example.com/lunch/dinner/login.html?returnURL=https://breakfast.example.com/lunch/
I'm both failing to understand why and how to fix my regex. Perhaps I'm on the wrong track but how can I capture all my examples?
You can use some laziness:
^(.*?:\/\/).*?/(?=dinner/?)
Live demo
By using a .*
in the middle of your regex you ate everything until the last colon, where it found a match.
.*
in the middle of a regex, by the way, is very bad practice. It can cause horrendous backtracking performance degradation in long strings. .*?
is better, since it is reluctant rather than greedy.
The lookahead doesn't have to be lazy or not, the lookahead is only a check and in your case with a quasi-fixed string.
What you need to make lazy is obviously the subpattern before the lookahead.
^https?:\/\/(?:[^\/]+\/)*?(?=dinner(?:\/|$))
Note: (?:/|$)
is like a boundary that ensures the word "dinner" is followed by a slash or the end of the string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With