I do not understand, why my columns reg1 and reg2 remove "bbb" from my string, and only reg3 works as expected.
WITH t AS (SELECT 'aaa <b>bbb</b> ccc' AS teststring FROM dual)
SELECT
teststring,
regexp_replace(teststring, '<.+>') AS reg1,
regexp_replace(teststring, '<.*>') AS reg2,
regexp_replace(teststring, '<.*?>') AS reg3
FROM t
TESTSTRING REG1 REG2 REG3
aaa <b>bbb</b> ccc aaa ccc aaa ccc aaa bbb ccc
Thanks a lot!
Because regex is greedy by default. I.e. the expressions .*
or .+
try to take as many characters as possible. Therefore <.+>
will span from the first <
to the last >
. Make it lazy by using the lazy operator ?
:
regexp_replace(teststring, '<.+?>')
or
regexp_replace(teststring, '<.*?>')
Now, the search for >
will stop at the first >
encountered.
Note that .
includes >
as well, therefore the greedy variant (without ?
) swallows all the >
but the last.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With