I have data like "11223311" and I want all the multiple occurrence to be replaced by single occurrence i.e. the above should turn into '123'. I am working in SAP HANA.
But by using below logic I am getting '1231' from '11223311'.
SELECT REPLACE_REGEXPR('(.)\1+' IN '11223331' WITH '\1' OCCURRENCE ALL) FROM DUMMY;
Your regular expression only replaces multiple consecutive occurrences of characters; that's what the \1+
directly after it's matching (.)
is doing.
You can use look-ahead to remove all characters that also occur somewhere after that match. Note that this keeps the last occurrence, not the first:
SELECT REPLACE_REGEXPR('(.)(?=.*\1)' IN '11223331' WITH '' OCCURRENCE ALL) FROM DUMMY
This returns: 231
If you want to keep the first occurrence, I don't see a possibility just with one regex (I could be wrong though). Using a look-behind in the same way does not work because it would need to be variable-length, which is not supported in HANA and most other implementations. Often \K is recommended as alternative, but something like (.).*\K\1
wouldn't work with replace all, because all characters before \K are still consumed in replace. If you could run the same regex in a loop, it could work but then why not use a non-regex loop (like a user-defined HANA function) in the first place.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With