I have to clean up some varchar in the following manner:
Remove special characters such as: !, @, #, $, %, ^, &, *, (, ), }, {, [, ], ",", ., ?, /, ',
from a closed list. I've managed to do so with a mass use of replace\regexp_replace but I'm looking for something similar to the one in SQL server.
Remove following numbers but not adjacent ones meaning:
round 1 --> round
round1 --> round1
round 12345 --> round
round12345 --> round12345
Remove words out of a closed list of words such as: "and", "or", "Ltd.", "international" without substrings:
more food or drinks ---> more food drinks. and not --> me food or drinks
I'm working on a big table and I'd like to do it as efficient as possible.
Should I write a function that does that or is there a more elegant way?
I'll play along. Here's for question 2:
SELECT trim(regexp_replace(
'12 foo1 run 1457 ABC 1Foo 2456 bar 34',
'\s*\m\d+\M\s*',
' ',
'g'
));
Returns:
foo1 run ABC 1Foo bar
I updated the answer to use constraint expressions instead of the bracket expressions, because the manual informs:
The constraint escapes described below are usually preferable; they are no more standard, but are easier to type.
\s*
.. zero or more white space\m
.. beginning of a word (same as [[:<:]]
)\d+
.. 1 or more digits\M
.. end of a word (same as [[:>:]]
)
The 4th parameter 'g'
is needed to replace "globally", not just the first occurrence.
->sqlfiddle for v9.2
->sqlfiddle for v8.4 doesn't work
standard_conforming_strings
. The default changed with v9.1.
This works in both worlds, "compatibility mode" so to say. But the syntax of the modern version above (in combination with standard_conforming_strings = on
) is cleaner.
SELECT trim(regexp_replace(
'12 foo1 run 1457 ABC 1Foo 2456 bar 34',
E'\\s*\\m\\d+\\M\\s*',
' ',
'g'
));
->sqlfiddle
1. It is much better to replace character which are not in allowed list with spaces, like this:
select regexp_replace(
E'aśx€ ąsd, dsa w|adwf\n as dw dgaa[aw] asdd',
'[^a-zA-Z0-9]',
' ',
'g');
This returns
a x sd dsa w adwf as dw dgaa aw asdd
There are thousands of possible characters in Unicode — it isn't really possible to list all special characters.
Taking out multiple consecutive spaces left as exercise to the reader.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With