I am wondering how to convert comma-delimited values into rows in Redshift. I am afraid that my own solution isn't optimal. Please advise. I have table with one of the columns with coma-separated values. For example: I have: <pre class="prettyprint"><code>user_id|user_name|user_action ----------------------------- 1 | Shone | start,stop,cancell... </code></pre> I would like to see <pre class="prettyprint"><code>user_id|user_name|parsed_action ------------------------------- 1 | Shone | start 1 | Shone | stop 1 | Shone | cancell .... </code></pre>

A slight improvement over the existing answer is to use a second "numbers" table that enumerates all of the possible list lengths and then use a <code>cross join</code> to make the query more compact. Redshift does not have a straightforward method for creating a numbers table that I am aware of, but we can use a bit of a hack from https://www.periscope.io/blog/generate-series-in-redshift-and-mysql.html to create one using row numbers. Specifically, if we assume the number of rows in <code>cmd_logs</code> is larger than the maximum number of commas in the <code>user_action</code> column, we can create a numbers table by counting rows. To start, let's assume there are at most 99 commas in the <code>user_action</code> column: <pre class="prettyprint lang-sql prettyprint-override"><code>select (row_number() over (order by true))::int as n into numbers from cmd_logs limit 100; </code></pre> If we want to get fancy, we can compute the number of commas from the <code>cmd_logs</code> table to create a more precise set of rows in <code>numbers</code>: <pre class="prettyprint lang-sql prettyprint-override"><code>select n::int into numbers from (select row_number() over (order by true) as n from cmd_logs) cross join (select max(regexp_count(user_action, '[,]')) as max_num from cmd_logs) where n <= max_num + 1; </code></pre> Once there is a <code>numbers</code> table, we can do: <pre class="prettyprint lang-sql prettyprint-override"><code>select user_id, user_name, split_part(user_action,',',n) as parsed_action from cmd_logs cross join numbers where split_part(user_action,',',n) is not null and split_part(user_action,',',n) != ''; </code></pre>

Another idea is to transform your CSV string into JSON first, followed by JSON extract, along the following lines: <code>... '["' || replace( user_action, '.', '", "' ) || '"]' AS replaced</code> <code>... JSON_EXTRACT_ARRAY_ELEMENT_TEXT(replaced, numbers.i) AS parsed_action</code> Where "numbers" is the table from the first answer. The advantage of this approach is the ability to use built-in JSON functionality.

Redshift. Convert comma delimited values into rows

Tags:

amazon-web-services

amazon-redshift

I am wondering how to convert comma-delimited values into rows in Redshift. I am afraid that my own solution isn't optimal. Please advise. I have table with one of the columns with coma-separated values. For example:

I have:

user_id|user_name|user_action ----------------------------- 1      | Shone   | start,stop,cancell...

I would like to see

user_id|user_name|parsed_action  -------------------------------  1      | Shone   | start         1      | Shone   | stop          1      | Shone   | cancell       ....

738

asked Aug 04 '14 05:08

Yuri Levinsky

2 Answers

A slight improvement over the existing answer is to use a second "numbers" table that enumerates all of the possible list lengths and then use a cross join to make the query more compact.

Redshift does not have a straightforward method for creating a numbers table that I am aware of, but we can use a bit of a hack from https://www.periscope.io/blog/generate-series-in-redshift-and-mysql.html to create one using row numbers.

Specifically, if we assume the number of rows in cmd_logs is larger than the maximum number of commas in the user_action column, we can create a numbers table by counting rows. To start, let's assume there are at most 99 commas in the user_action column:

select    (row_number() over (order by true))::int as n into numbers from cmd_logs limit 100;

If we want to get fancy, we can compute the number of commas from the cmd_logs table to create a more precise set of rows in numbers:

select   n::int into numbers from   (select        row_number() over (order by true) as n    from cmd_logs) cross join   (select        max(regexp_count(user_action, '[,]')) as max_num     from cmd_logs) where   n <= max_num + 1;

Once there is a numbers table, we can do:

select   user_id,    user_name,    split_part(user_action,',',n) as parsed_action  from   cmd_logs cross join   numbers where   split_part(user_action,',',n) is not null   and split_part(user_action,',',n) != '';

200

answered Sep 21 '22 07:09

Bob Baxley

Another idea is to transform your CSV string into JSON first, followed by JSON extract, along the following lines:

... '["' || replace( user_action, '.', '", "' ) || '"]' AS replaced

... JSON_EXTRACT_ARRAY_ELEMENT_TEXT(replaced, numbers.i) AS parsed_action

Where "numbers" is the table from the first answer. The advantage of this approach is the ability to use built-in JSON functionality.

answered Sep 21 '22 07:09

YakovK

Related questions
                            
                                How do you set SSE-S3 or SSE-KMS encryption on S3 buckets using Cloud Formation Template?
                            
                                AWS Lambda RDS connection timeout
                            
                                Can you add a global secondary index to dynamodb after table has been created?
                            
                                Setting up JMeter for Distributed testing in AWS with connectivity issues
                            
                                How do I get the AccountId as a variable in a serverless.yml file?
                            
                                AWS Lambda scheduled event source via cloudformation
                            
                                How to update multiple items in a DynamoDB table at once
                            
                                AWS Lambda - How to stop retries when there is a failure
                            
                                How do I set up cloud-init on custom AMIs in AWS? (CentOS)
                            
                                AWS Configure Bash One Liner
                            
                                Why does AWS RDS Aurora have the option of "Multi-AZ Deployment" when it does replication across different zones already by default?
                            
                                Terraform: what does AssumeRole: Service: ec2 do?
                            
                                How to access/ping a server located on AWS?
                            
                                How to install NGINX on AWS EC2 Linux 2 [closed]
                            
                                AWS Lambda Function is returning "Cannot find module 'index'" yet the handler in the config is set to index
                            
                                Parsing secrets from AWS secrets manager using AWS cli
                            
                                Application Load Balancers vs API Gateway
                            
                                Is multi-AZ RDS really worth it? [closed]
                            
                                Terraform: Error creating IAM Role. MalformedPolicyDocument: Has prohibited field Resource
                            
                                Redirect http:// requests to https:// on AWS API Gateway (using Custom Domains)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With