The question of how to split a field (e.g. a CSV string) into multiple rows has already been answered: Split values over multiple rows.
However, this question refers to MSSQL, and the answers use various features for which there are no RedShift equivalents.
For the sake of completeness, here's an example of what I'd like to do:
Current data:
| Key | Data |
+-----+----------+
| 1 | 18,20,22 |
| 2 | 17,19 |
Required data:
| Key | Data |
+-----+----------+
| 1 | 18 |
| 1 | 20 |
| 1 | 22 |
| 2 | 17 |
| 2 | 19 |
Now, I can suggest a walkaround for the case of small, bounded number of elements in the CSV field: use split_part and union over all possible array locations, like so:
SELECT Key, split_part(Data, ',', 1)
FROM mytable
WHERE split_part(Data, ',', 1) != ""
UNION
SELECT Key, split_part(Data, ',', 2)
FROM mytable
WHERE split_part(Data, ',', 2) != ""
-- etc. etc.
However, this is obviously very inefficient, and would not work for longer lists.
Any better ideas on how to do this?
EDIT:
There's also a somewhat similar question regarding multiplying rows: splitting rows in Redshift. However I don't see how this approach can be applied here.
EDIT 2:
A possible duplicate: Redshift. Convert comma delimited values into rows. But nothing new - the answer by @Masashi Miyazaki is similar to my suggestion above, and suffers from the same issues.
Splits a string on the specified delimiter and returns the part at the specified position.
To unnest queries, Amazon Redshift uses the PartiQL syntax to iterate over SUPER arrays. It does this by navigating the array using the FROM clause of a query. Using the previous example, the following example iterates over the attribute values for c_orders .
Amazon Redshift now natively supports PIVOT and UNPIVOT SQL operators with built-in optimizations that you can use for data modeling, data analysis, and data presentation. You can apply PIVOT and UNPIVOT to tables, sub-queries, and common table expressions (CTEs).
Here is the Redshift answer, it will work with up to 10 thousand values per row.
Set up test data
create table test_data (key varchar(50),data varchar(max));
insert into test_data
values
(1,'18,20,22'),
(2,'17,19')
;
code
with ten_numbers as (select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0)
, generted_numbers AS
(
SELECT (1000 * t1.num) + (100 * t2.num) + (10 * t3.num) + t4.num AS gen_num
FROM ten_numbers AS t1
JOIN ten_numbers AS t2 ON 1 = 1
JOIN ten_numbers AS t3 ON 1 = 1
JOIN ten_numbers AS t4 ON 1 = 1
)
, splitter AS
(
SELECT *
FROM generted_numbers
WHERE gen_num BETWEEN 1 AND (SELECT max(REGEXP_COUNT(data, '\\,') + 1)
FROM test_data)
)
, expanded_input AS
(
SELECT
key,
split_part(data, ',', s.gen_num) AS data
FROM test_data AS td
JOIN splitter AS s ON 1 = 1
WHERE split_part(data, ',', s.gen_num) <> ''
)
SELECT * FROM expanded_input
order by key,data;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With