I wrote this regex to match all href
and src
links in an HTML page; (I know I should be using a parser; this just experimenting):
/((href|src)\=\").*?\"/
# Without look-behind
It works fine, but when I try to modify the first portion of the expression as a look-behind pattern:
/(?<=(href|src)\=\").*?\"/
# With look-behind
It throws an error stating 'invalid look-behind pattern'. Any ideas, whats going wrong with the look-behind?
Lookbehind has restrictions:
(?<=subexp) look-behind
(?<!subexp) negative look-behind
Subexp of look-behind must be fixed character length.
But different character length is allowed in top level
alternatives only.
ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed.
In negative-look-behind, captured group isn't allowed,
but shy group(?:) is allowed.
You cannot put alternatives in a non-top level within a (negative) lookbehind.
Put them at the top level. You also don't need to escape some characters that you did.
/(?<=href="|src=").*?"/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With