Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unescape a string with escaped newlines and carriage returns

I'm trying to write a PLPGSQL function which obfuscates/censors/redacts text.

-- Obfuscate a body of text by replacing lowercase letters and numbers with # symbols.
CREATE OR REPLACE FUNCTION obfuscate(str text) RETURNS text AS $$
BEGIN
  str := replace(str, '\r', E'\r');
  str := replace(str, '\n', E'\n');
  str := translate(str, 'abcdefghijklmnopqrstuvwxyz0123456789', rpad('#',36,'#'));
  str := replace(str, E'\r', '\r');
  str := replace(str, E'\n', '\n');
  RETURN str;
END
$$ LANGUAGE plpgsql;

This works, but note the dance to convert escaped newlines and carriage returns to their respective byte, and then back again. This is because my dataset contains strings that have been escaped (data which has been serialized to JSON/YAML), and I don't want to clobber those values.

Is there another more convenient way to unescape a string? It would be great to handle other escaped values, like unicode escape sequences, too.

like image 295
Brent Dillingham Avatar asked Jan 19 '16 02:01

Brent Dillingham


People also ask

What does Unescape mean?

Definition of unescaped : not escaped : retained unescaped vapors.

What is unescape string?

JavaScript unescape() Function The unescape() function in JavaScript takes a string as a parameter and uses to decode that string encoded by the escape() function. The hexadecimal sequence in the string is replaced by the characters they represent when decoded via unescape(). Syntax: unescape(string)

What is escape and unescape in JavaScript?

The escape() and unescape() functions is to Encode and decode a string in JavaScript. The escape() function in JavaScript to make a string portable to transmit it over a network and we can use unscape() function to get back the original string.

How do you escape a character in a string?

\ is a special character within a string used for escaping. "\" does now work because it is escaping the second " . To get a literal \ you need to escape it using \ .


1 Answers

To "unescape" a string, you have to "execute" it - literally. Use the EXECUTE command in plpgsql.
You can wrap this into a function. Naive approach:

CREATE OR REPLACE FUNCTION f_unescape(text, OUT _t text)
  LANGUAGE plpgsql STABLE AS
$func$
BEGIN
   EXECUTE 'SELECT E''' || $1 || ''''
   INTO _t;
END
$func$;

Call:

SELECT f_unescape('\r\nabcdef\t0123\x123\n');

This naive function is vulnerable to single quotes in the original string, which need to be escaped. But that's a bit tricky. Single quotes can be escaped in two ways in a Posix escape string syntax: \' or ''. But we could also have \\' etc. Basics:

  • Insert text with single quotes in PostgreSQL

We could enclose the string in dollar quoting, but that does not work for Posix escape string syntax. E'\'' cannot be replaced with E$$\'$$. We could add SET standard_conforming_strings = off to the function, then we wouldn't have to prepend strings with E. But that would disable function inlining and interpret escapes everywhere in the function body.

Instead, escape all ' and all (optionally) leading \ with regexp_replace():

regexp_replace($1, '(\\*)(\''+)', '\1\1\2\2', 'g')

(\\*) .. 0 or more leading \
(\''+) .. capture 1 or more '
'\1\1\2\2' .. double up each match
'g' .. replace all occurrences, not just the first

Safe function

CREATE OR REPLACE FUNCTION f_unescape(IN text, OUT _t text)
  RETURNS text
  LANGUAGE plpgsql STABLE AS
$func$
BEGIN
   EXECUTE $$SELECT E'$$ || regexp_replace($1, '(\\*)(\''+)', '\1\1\2\2', 'g') || $$'$$
   INTO _t;
END
$func$;

The operation cannot be reversed reliable. There is no way to tell which special character was escaped before and which wasn't. You can escape all or none. Or do it manually like before. But if the same character was included in literal and escape form, you cannot tell them apart any more.

Test case:

SELECT t, f_unescape(t)
FROM  (
   VALUES
     ($$'$$)
   , ($$''$$)
   , ($$'''$$)
   , ($$\'$$)
   , ($$\\'$$)
   , ($$\\\'$$)
   , ($$\\\'''$$)
   , ($$\r\\'nabcdef\\\t0123\x123\\\\\'''\n$$)
   ) v(t);
like image 155
Erwin Brandstetter Avatar answered Oct 14 '22 02:10

Erwin Brandstetter