Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL tuple format

Is there any document describing the tuple format that PostgreSQL server adheres to? The official documentation appears arcane about this.

A single tuple seems simple enough to figure out, but when it comes to arrays of tuples, arrays of composite tuples, and finally nested arrays of composite tuples, it is impossible to be certain about the format simply by looking at the output.

I am asking this following my initial attempt at implementing pg-tuple, a parser that's still missing today, to be able to parse PostgreSQL tuples within Node.js


Examples

create type type_A as (
   a int,
   b text
);
  • with a simple text: (1,hello)
  • with a complex text: (1,"hello world!")

create type type_B as (
   c type_A,
   d type_A[]
);
  • simple-value array: {"(2,two)","(3,three)"}

  • for type_B[] we can get:

{"(\"(7,inner)\",\"{\"\"(88,eight-1)\"\",\"\"(99,nine-2)\"\"}\")","(\"(77,inner)\",\"{\"\"(888,eight-3)\"\",\"\"(999,nine-4)\"\"}\")"}

It gets even more complex for multi-dimensional arrays of composite types.


UPDATE

Since it feels like there is no specification at all, I have started working on reversing it. Not sure if it can be done fully though, because from some initial examples it is often unclear what formatting rules are applied.

like image 709
vitaly-t Avatar asked Sep 15 '16 21:09

vitaly-t


People also ask

What is a tuple in PostgreSQL?

A tuple is PostgreSQL's internal representation of a row in a table. A single row may have many tuples representing it, but only one of these tuples will be applicable at any single point in time.

How do I categorize data in PostgreSQL?

CREATE TABLE groups ( id int PRIMARY KEY, name varchar(30) ); CREATE TABLE emp ( id int PRIMARY KEY, name varchar(50), group_id int REFERENCES groups ); INSERT INTO groups(id,name) VALUES (1, 'Managers'), (2, 'Teachers'), (3, 'Cleaners'), (4, 'Drivers'); INSERT INTO emp(id,name,group_id) VALUES (1,'john',1), (2,'Smith' ...

What is Pg_relation_size?

pg_relation_size accepts the OID or name of a table, index or toast table, and returns the on-disk size in bytes of one fork of that relation. (Note that for most purposes it is more convenient to use the higher-level functions pg_total_relation_size or pg_table_size , which sum the sizes of all forks.)

What are live tuples?

The 124,510,280 live tuples are the number of tuples that are up to date in the table and available to be read or modified in transactions. The 3,087,919 dead tuples are the number of tuples that have been changed and are unavailable to be used in future transactions.


1 Answers

As Nick posted, according to docs:

the whitespace will be ignored if the field type is integer, but not if it is text.

and

The composite output routine will put double quotes around field values if they are empty strings or contain parentheses, commas, double quotes, backslashes, or white space.

and

Double quotes and backslashes embedded in field values will be doubled.

and now quoting Nick himself:

nested elements are converted to strings, and then quoted / escaped like any other string

I give shorted example below, comfortably compared against its nested value:

a=# create table playground (t text, ta text[],f float,fa float[]);
CREATE TABLE
a=# insert into playground select 'space here',array['','bs\'],8.0,array[null,8.1];
INSERT 0 1
a=# insert into playground select 'no_space',array[null,'nospace'],9.0,array[9.1,8.0];
INSERT 0 1
a=# select playground,* from playground;
                    playground                     |     t      |       ta       | f |     fa
---------------------------------------------------+------------+----------------+---+------------
 ("space here","{"""",""bs\\\\""}",8,"{NULL,8.1}") | space here | {"","bs\\"}    | 8 | {NULL,8.1}
 (no_space,"{NULL,nospace}",9,"{9.1,8}")           | no_space   | {NULL,nospace} | 9 | {9.1,8}
(2 rows)

If you go for deeper nested quoting, look at:

a=# select nested,* from (select playground,* from playground) nested;
                                                         nested                                                          |                    playground                     |     t      |       ta       | f |     fa
-------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+------------+----------------+---+------------
 ("(""space here"",""{"""""""",""""bs\\\\\\\\""""}"",8,""{NULL,8.1}"")","space here","{"""",""bs\\\\""}",8,"{NULL,8.1}") | ("space here","{"""",""bs\\\\""}",8,"{NULL,8.1}") | space here | {"","bs\\"}    | 8 | {NULL,8.1}
 ("(no_space,""{NULL,nospace}"",9,""{9.1,8}"")",no_space,"{NULL,nospace}",9,"{9.1,8}")                                   | (no_space,"{NULL,nospace}",9,"{9.1,8}")           | no_space   | {NULL,nospace} | 9 | {9.1,8}
(2 rows)

As you can see, the output again follows rules the above.

This way in short answers to your questions would be:

  • why array is normally presented inside double-quotes, while an empty array is suddenly an open value? (text representation of empty array does not contain comma or space or etc)
  • why a single " is suddenly presented as \""? (text representation of 'one\ two', according to rules above is "one\\ two", and text representation of the last is ""one\\\\two"" and it is just what you get)
  • why unicode-formatted text is changing the escaping for \? How can we tell the difference then? (According to docs,

PostgreSQL also accepts "escape" string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letter E (upper or lower case) just before the opening single quote

), so it is not unicode text, but the the way you tell postgres that it should interpret escapes in text not as symbols, but as escapes. Eg E'\'' will be interpreted as ' and '\'' will make it wait for closing ' to be interpreted. In you example E'\\ text' the text represent of it will be "\\ text" - we add backslsh for backslash and take value in double quotes - all as described in online docs.

  • the way that { and } are escaped is not always clear (I could not anwer this question, because it was not clear itself)
like image 190
Vao Tsun Avatar answered Oct 02 '22 15:10

Vao Tsun