Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redshift. Convert comma delimited values into rows

I am wondering how to convert comma-delimited values into rows in Redshift. I am afraid that my own solution isn't optimal. Please advise. I have table with one of the columns with coma-separated values. For example:

I have:

user_id|user_name|user_action ----------------------------- 1      | Shone   | start,stop,cancell... 

I would like to see

user_id|user_name|parsed_action  -------------------------------  1      | Shone   | start         1      | Shone   | stop          1      | Shone   | cancell       .... 
like image 738
Yuri Levinsky Avatar asked Aug 04 '14 05:08

Yuri Levinsky


People also ask

What does split_ part do in sql?

Splits a string on the specified delimiter and returns the part at the specified position.

How do you get part of a string in redshift?

To extract the beginning segment of a string based on the length in bytes, you can CAST the string as VARCHAR(byte_length) to truncate the string, where byte_length is the required length.


2 Answers

A slight improvement over the existing answer is to use a second "numbers" table that enumerates all of the possible list lengths and then use a cross join to make the query more compact.

Redshift does not have a straightforward method for creating a numbers table that I am aware of, but we can use a bit of a hack from https://www.periscope.io/blog/generate-series-in-redshift-and-mysql.html to create one using row numbers.

Specifically, if we assume the number of rows in cmd_logs is larger than the maximum number of commas in the user_action column, we can create a numbers table by counting rows. To start, let's assume there are at most 99 commas in the user_action column:

select    (row_number() over (order by true))::int as n into numbers from cmd_logs limit 100; 

If we want to get fancy, we can compute the number of commas from the cmd_logs table to create a more precise set of rows in numbers:

select   n::int into numbers from   (select        row_number() over (order by true) as n    from cmd_logs) cross join   (select        max(regexp_count(user_action, '[,]')) as max_num     from cmd_logs) where   n <= max_num + 1; 

Once there is a numbers table, we can do:

select   user_id,    user_name,    split_part(user_action,',',n) as parsed_action  from   cmd_logs cross join   numbers where   split_part(user_action,',',n) is not null   and split_part(user_action,',',n) != ''; 
like image 200
Bob Baxley Avatar answered Sep 21 '22 07:09

Bob Baxley


Another idea is to transform your CSV string into JSON first, followed by JSON extract, along the following lines:

... '["' || replace( user_action, '.', '", "' ) || '"]' AS replaced

... JSON_EXTRACT_ARRAY_ELEMENT_TEXT(replaced, numbers.i) AS parsed_action

Where "numbers" is the table from the first answer. The advantage of this approach is the ability to use built-in JSON functionality.

like image 22
YakovK Avatar answered Sep 21 '22 07:09

YakovK