How can I get the text between the first and second occurence of a sequence while including the first occurence and excluding the second occurence?
So for example:
Let my sequence be = "xx - "
Let my text be = "xx - blablabla bla blaxx - blablabla bla bla xx - blabla"
So i want my regular expression to get this chunk:
"xx - blablabla bla bla"
I tried something like this:
^xx - .*xx -
but this gets the text between the first and third occurence, and does not exclude the last occurence.
(xx - )(.*?)\1
Explanation
(xx - ) # your sequence (group 1) (.*?) # anything, match non-greedily into group 2 \1 # whatever group 1 was
You want the contents of group 2.
Be aware that regex engines use different styles of back-referencing, the most common alternative to \1 is $1.
Here's why your approach does not work: Your error is extremely common. It consists of thinking that .* somehow magically would stop at the right point to let the rest of the regex match. It does not.
.* goes right to the end of the line/string, without any amount of consideration. That's called "greedy matching". When it hit the end of the string, backtracking occurs. The first point where the rest of your regex can match is the last occurrence of your sequence, seen from the end of the string. You end up with the longest possible match.
.*? does what's called "non-greedy matching". It checks the rest of the regex before it moves to the next character. That's why the first occurrence of your sequence can match. You end up with the shortest possible match.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With