I'm trying to write a PLPGSQL function which obfuscates/censors/redacts text.
-- Obfuscate a body of text by replacing lowercase letters and numbers with # symbols.
CREATE OR REPLACE FUNCTION obfuscate(str text) RETURNS text AS $$
BEGIN
str := replace(str, '\r', E'\r');
str := replace(str, '\n', E'\n');
str := translate(str, 'abcdefghijklmnopqrstuvwxyz0123456789', rpad('#',36,'#'));
str := replace(str, E'\r', '\r');
str := replace(str, E'\n', '\n');
RETURN str;
END
$$ LANGUAGE plpgsql;
This works, but note the dance to convert escaped newlines and carriage returns to their respective byte, and then back again. This is because my dataset contains strings that have been escaped (data which has been serialized to JSON/YAML), and I don't want to clobber those values.
Is there another more convenient way to unescape a string? It would be great to handle other escaped values, like unicode escape sequences, too.
Definition of unescaped : not escaped : retained unescaped vapors.
JavaScript unescape() Function The unescape() function in JavaScript takes a string as a parameter and uses to decode that string encoded by the escape() function. The hexadecimal sequence in the string is replaced by the characters they represent when decoded via unescape(). Syntax: unescape(string)
The escape() and unescape() functions is to Encode and decode a string in JavaScript. The escape() function in JavaScript to make a string portable to transmit it over a network and we can use unscape() function to get back the original string.
\ is a special character within a string used for escaping. "\" does now work because it is escaping the second " . To get a literal \ you need to escape it using \ .
To "unescape" a string, you have to "execute" it - literally. Use the EXECUTE
command in plpgsql.
You can wrap this into a function. Naive approach:
CREATE OR REPLACE FUNCTION f_unescape(text, OUT _t text)
LANGUAGE plpgsql STABLE AS
$func$
BEGIN
EXECUTE 'SELECT E''' || $1 || ''''
INTO _t;
END
$func$;
Call:
SELECT f_unescape('\r\nabcdef\t0123\x123\n');
This naive function is vulnerable to single quotes in the original string, which need to be escaped. But that's a bit tricky. Single quotes can be escaped in two ways in a Posix escape string syntax: \'
or ''
. But we could also have \\'
etc. Basics:
We could enclose the string in dollar quoting, but that does not work for Posix escape string syntax. E'\''
cannot be replaced with . We could add E$$\'$$
SET standard_conforming_strings = off
to the function, then we wouldn't have to prepend strings with E
. But that would disable function inlining and interpret escapes everywhere in the function body.
Instead, escape all '
and all (optionally) leading \
with regexp_replace()
:
regexp_replace($1, '(\\*)(\''+)', '\1\1\2\2', 'g')
(\\*)
.. 0 or more leading \
(\''+)
.. capture 1 or more '
'\1\1\2\2'
.. double up each match'g'
.. replace all occurrences, not just the first
CREATE OR REPLACE FUNCTION f_unescape(IN text, OUT _t text)
RETURNS text
LANGUAGE plpgsql STABLE AS
$func$
BEGIN
EXECUTE $$SELECT E'$$ || regexp_replace($1, '(\\*)(\''+)', '\1\1\2\2', 'g') || $$'$$
INTO _t;
END
$func$;
The operation cannot be reversed reliable. There is no way to tell which special character was escaped before and which wasn't. You can escape all or none. Or do it manually like before. But if the same character was included in literal and escape form, you cannot tell them apart any more.
Test case:
SELECT t, f_unescape(t)
FROM (
VALUES
($$'$$)
, ($$''$$)
, ($$'''$$)
, ($$\'$$)
, ($$\\'$$)
, ($$\\\'$$)
, ($$\\\'''$$)
, ($$\r\\'nabcdef\\\t0123\x123\\\\\'''\n$$)
) v(t);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With