Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split comma separated values into target table with fixed number of columns

I have a table with a single column in a Postgres 13.1 database. It consists of many rows with comma-separated values - around 20 elements at most.

I want to split the data into multiple columns. But I have only a limited number of columns say 5 and more than 5 CSV values in a single row, so excess values must be shifted to new/next row). How to do this?

Example:

a1, b1, c1
a2, b2, c2, d2, e2, f2
a3, b3, c3, d3, e3, f3, g3, h3, i3, j3
a4
a5, b5, c5
'
'
'

Columns are only 5, so the output would be like:

c1 c2 c3 c4 c5
---------------
a1 b1 c1
a2 b2 c2 d2 e2 
f2
a3 b3 c3 d3 e3
f3 g3 h3 i3 j3
a4
a5 b5 c5
'
'
'
like image 694
Martin2000 Avatar asked Nov 16 '25 09:11

Martin2000


1 Answers

It is typically bad design to store CSV values in a single column. If at all possible, use an array or a properly normalized design instead.

While stuck with your current situation ...

For known small maximum number of elements

A simple solution without trickery or recursion will do:

SELECT id, 1 AS rnk
     , split_part(csv, ', ', 1) AS c1
     , split_part(csv, ', ', 2) AS c2
     , split_part(csv, ', ', 3) AS c3
     , split_part(csv, ', ', 4) AS c4
     , split_part(csv, ', ', 5) AS c5
FROM   tbl
WHERE  split_part(csv, ', ', 1) <> '' -- skip empty rows

UNION ALL
SELECT id, 2
     , split_part(csv, ', ', 6)
     , split_part(csv, ', ', 7)
     , split_part(csv, ', ', 8)
     , split_part(csv, ', ', 9)
     , split_part(csv, ', ', 10)
FROM   tbl
WHERE  split_part(csv, ', ', 6) <> '' -- skip empty rows

-- three more blocks to cover a maximum "around 20"

ORDER  BY id, rnk;

db<>fiddle here

id being the PK of the original table.
This assumes ', ' as separator, obviously.
You can adapt easily.

Related:

  • Split comma separated column data into additional columns

For unknown number of elements

Various ways. One way use regexp_replace() to replace every fifth separator before unnesting ...

-- for any number of elements
SELECT t.id, c.rnk
     , split_part(c.csv5, ', ', 1) AS c1
     , split_part(c.csv5, ', ', 2) AS c2
     , split_part(c.csv5, ', ', 3) AS c3
     , split_part(c.csv5, ', ', 4) AS c4
     , split_part(c.csv5, ', ', 5) AS c5
FROM   tbl t
     , unnest(string_to_array(regexp_replace(csv, '((?:.*?,){4}.*?),', '\1;', 'g'), '; ')) WITH ORDINALITY c(csv5, rnk)
ORDER  BY t.id, c.rnk;

db<>fiddle here

This assumes that the chosen separator ; never appears in your strings. (Just like , can never appear.)

The regular expression pattern is the key: '((?:.*?,){4}.*?),'

(?:) ... “non-capturing” set of parentheses
() ... “capturing” set of parentheses
*? ... non-greedy quantifier
{4}? ... sequence of exactly 4 matches

The replacement '\1;' contains the back-reference \1.

'g' as fourth function parameter is required for repeated replacement.

Further reading:

  • PostgreSQL & regexp_split_to_array + unnest
  • Apply `trim()` and `regexp_replace()` on text array
  • PostgreSQL unnest() with element number

Other ways to solve this include a recursive CTE or a set-returning function ...

Fill from right to left

(Like you added in How to put values starting from the right side into columns?)
Simply count down numbers like:

SELECT t.id, c.rnk
     , split_part(c.csv5, ', ', 5) AS c1
     , split_part(c.csv5, ', ', 4) AS c2
     , split_part(c.csv5, ', ', 3) AS c3
     , split_part(c.csv5, ', ', 2) AS c4
     , split_part(c.csv5, ', ', 1) AS c5
FROM ...

db<>fiddle here

like image 57
Erwin Brandstetter Avatar answered Nov 19 '25 09:11

Erwin Brandstetter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!