Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error while using regexp_split_to_table (Amazon Redshift)

I have the same question as this:
Splitting a comma-separated field in Postgresql and doing a UNION ALL on all the resulting tables
Just that my 'fruits' column is delimited by '|'. When I try:

SELECT 
    yourTable.ID, 
    regexp_split_to_table(yourTable.fruits, E'|') AS split_fruits
FROM yourTable

I get the following:

ERROR: type "e" does not exist

Q1. What does the E do? I saw some examples where E is not used. The official docs don't explain it in their "quick brown fox..." example.

Q2. How do I use '|' as the delimiter for my query?

Edit: I am using PostgreSQL 8.0.2. unnest() and regexp_split_to_table() both are not supported.

like image 660
Reise45 Avatar asked Mar 10 '15 22:03

Reise45


People also ask

How to split a string into pieces using Amazon Redshift?

As helper database objects and Redshift database functions; I will use a numbers table, Common Table Expression CTE and regexp_count function We can split this string into pieces using split_part string function and fetch the first item in the databases list as "Amazon Redshift".

Is it possible to split a delimited string into rows in redshift?

Amazon Redshift is relatively new to relational databases. It is based on PostgreSQL, but being a columnar distributed database, it does not support all functions that are available in PostgreSQl. One of such requirement is split a delimited string into rows.

Does Amazon Redshift support regular expression functions?

Amazon Redshift provides basic regular expression functions support. It does not provide functions such as regexp_split_to_table or array functions. Most importantly, Redshift does support SPLIT_PART function. Meanwhile, you can use this function to as an alternative to split your delimited input string into rows. Redshift SPLIT_PART Function

Is Amazon Redshift a relational database?

Amazon Redshift is relatively new to relational databases. It is based on PostgreSQL, but being a columnar distributed database, it does not support all functions that are available in PostgreSQl.


1 Answers

A1

E is a prefix for Posix-style escape strings. You don't normally need this in modern Postgres. Only prepend it if you want to interpret special characters in the string. Like E'\n' for a newline char.Details and links to documentation:

  • Insert text with single quotes in PostgreSQL
  • SQL select where column begins with \

E is pointless noise in your query, but it should still work. The answer you are linking to is not very good, I am afraid.

A2

Should work as is. But better without the E.

SELECT id, regexp_split_to_table(fruits, '|') AS split_fruits
FROM   tbl;

For simple delimiters, you don't need expensive regular expressions. This is typically faster:

SELECT id, unnest(string_to_array(fruits, '|')) AS split_fruits
FROM   tbl;

In Postgres 9.3+ you'd rather use a LATERAL join for set-returning functions:

SELECT t.id, f.split_fruits
FROM   tbl t
LEFT   JOIN LATERAL unnest(string_to_array(fruits, '|')) AS f(split_fruits)
                                                                   ON true;

Details:

  • What is the difference between LATERAL and a subquery in PostgreSQL?
  • PostgreSQL unnest() with element number

Amazon Redshift is not Postgres

It only implements a reduced set of features as documented in its manual. In particular, there are no table functions, including the essential functions unnest(), generate_series() or regexp_split_to_table() when working with its "compute nodes" (accessing any tables).

You should go with a normalized table layout to begin with (extra table with one fruit per row).

Or here are some options to create a set of rows in Redshift:

  • How to select multiple rows filled with constants in Amazon Redshift?

This workaround should do it:

  1. Create a table of numbers, with at least as many rows as there can be fruits in your column. Temporary or permanent if you'll keep using it. Say we never have more than 9:

    CREATE TEMP TABLE nr9(i int);
    INSERT INTO nr9(i) VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9);
    
  2. Join to the number table and use split_part(), which is actually implemented in Redshift:

    SELECT *, split_part(t.fruits, '|', n.i) As fruit
    FROM   nr9 n
    JOIN   tbl t ON split_part(t.fruits, '|', n.i) <> ''
    

Voilá.

like image 183
Erwin Brandstetter Avatar answered Jan 04 '23 05:01

Erwin Brandstetter