Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split column into multiple rows in Postgres

Suppose I have a table like this:

    subject     | flag ----------------+------  this is a test |    2 

subject is of type text, and flag is of type int. I would like to transform this table to something like this in Postgres:

    token       | flag ----------------+------  this           |    2  is             |    2  a              |    2  test           |    2 

Is there an easy way to do this?

like image 517
mgoldwasser Avatar asked Apr 02 '15 18:04

mgoldwasser


People also ask

How do I split a column in PostgreSQL?

We can use any of the string to split it; we can also use a column name as a substring to split the data from the column. Delimiter argument is used to split the string into sub-parts by using a split_part function in PostgreSQL. We can split the string into a number of parts using delimiter.

What is Regexp_split_to_table?

regexp_split_to_table() is a system function for splitting a string into a table using a POSIX regular expression as the delimiter.


2 Answers

In Postgres 9.3+ use a LATERAL join. Minimal form:

SELECT token, flag FROM   tbl, unnest(string_to_array(subject, ' ')) token WHERE  flag = 2; 

The comma in the FROM list is (almost) equivalent to CROSS JOIN, LATERAL is automatically assumed for set-returning functions (SRF) in the FROM list. Why "almost"? See:

  • "invalid reference to FROM-clause entry for table" in Postgres query

The alias "token" for the derived table is also assumed as column alias for a single anonymous column, and we assumed distinct column names across the query. Equivalent, more verbose and less error-prone:

SELECT s.token, t.flag FROM   tbl t CROSS  JOIN LATERAL unnest(string_to_array(subject, ' ')) AS s(token) WHERE  t.flag = 2; 

Or move the SRF to the SELECT list, which is allowed in Postgres (but not in standard SQL), to the same effect:

SELECT unnest(string_to_array(subject, ' ')) AS token, flag FROM   tbl WHERE  flag = 2; 

The last one seems acceptable since SRF in the SELECT list have been sanitized in Postgres 10. See:

  • What is the expected behaviour for multiple set-returning functions in SELECT clause?

If unnest() does not return any rows (empty or NULL subject), the (implicit) join eliminates the row from the result. Use LEFT JOIN ... ON true to keep qualifying rows from tbl. See:

  • What is the difference between LATERAL JOIN and a subquery in PostgreSQL?

We could also use regexp_split_to_table(), but that's typically slower because regular expressions cost a bit more. See:

  • SQL select rows containing substring in text field
  • PostgreSQL unnest() with element number
like image 82
Erwin Brandstetter Avatar answered Sep 19 '22 10:09

Erwin Brandstetter


I think it's not necessary to use a join, just the unnest() function in conjunction with string_to_array() should do it:

SELECT unnest(string_to_array(subject, ' ')) as "token", flag FROM test;  token | flag                                                                                                    -------+-------                                                                                                   this   |     2                                                                                                    is     |     2                                                                                                    a      |     2                                                                                                    test   |     2                                                                                                    
like image 41
Matt Avatar answered Sep 23 '22 10:09

Matt