I know that for text strings in PostgreSQL line breaks are unified by appending symbol E
or e
in front of the text:
SELECT E'first\nsecond'
results in:
first
second
But PostgreSQL also support line breaks within column names - not sure why or how evil this practice is, but one can do the following:
CREATE TABLE One("first\nsecond" text);
CREATE TABLE Two("first
second" text);
When you are unfortunate enough to run into one of these, you would find that while these queries work:
SELECT "first\nsecond" from One;
SELECT "first
second" from Two;
these ones do not:
SELECT "first
second" from One;
SELECT "first\nsecond" from Two;
My question is: Is there a way in PostgreSQL that unifies such differences, similar to the situation with the column values?
I have tried putting E
in front of "first\nsecond"
column names, but it is not supported. Trying to put \r\n
instead (I'm using Windows) gave me a third type of column names, one that can only be queried as:
SELECT "first\r\nsecond" FROM Third
Column names are identifiers, and the gory details of the syntax for identifiers are described at:
http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
TL;DR: use the U&"..."
syntax to inject non-printable characters into identifiers through their Unicode codepoints, and there's no way to unify CR,LF
with LF
alone.
We're allowed to use Unicode escape sequences in identifiers, so per documentation, the following does work:
select U&"first\000asecond" from Two;
if it's just a newline character between the two words.
The table is created with:
CREATE TABLE One("first\nsecond" text);
As the backslash character has no special meaning here, this column does not contain any newline.
It contains first
followed by \
followed by n
followed by second
.
So:
SELECT "first\nsecond" from One;
does work because it's the same as what's in the CREATE TABLE
whereas
SELECT "first
second" from One;
fails because there's a newline in that SELECT where the actual column name in the table has a backslash followed by a n
.
This is the opposite of "One".
CREATE TABLE Two("first
second" text);
The newline is taken verbatim and is part of the column. So
SELECT "first
second" from Two;
works because the newline is there exactly as in the CREATE TABLE, with an embedded newline, whereas
SELECT "first\nsecond" from Two;
fails because as previously \n
in this context does not mean a newline.
As mentioned in comments and your edit, this could be carriage return and newline instead, in which case the following should do:
select U&"first\000d\000asecond" from Two;
although in my test, hitting Enter in the middle of a column with psql
on Unix and Windows has the same effect: a single newline in the column's name.
To check what exact characters ended up in a column name, we can inspect them in hexadecimal.
When applied to your create table example, from inside psql under Unix:
CREATE TABLE Two("first
second" text);
select convert_to(column_name::text,'UTF-8')
from information_schema.columns
where table_schema='public'
and table_name='two';
The result is:
convert_to
----------------------------
\x66697273740a7365636f6e64
For more complex cases (e.g. non-ascii characters with several bytes in UTF-8), a more advanced query might help, for easy-to-read codepoints:
select c,lpad(to_hex(ascii(c)),4,'0') from (
select regexp_split_to_table(column_name::text,'') as c
from information_schema.columns
where table_schema='public'
and table_name='two'
) as g;
c | lpad
---+------
f | 0066
i | 0069
r | 0072
s | 0073
t | 0074
+| 000a
|
s | 0073
e | 0065
c | 0063
o | 006f
n | 006e
d | 0064
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With