Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

generate_series function in Amazon Redshift

I tried the below:

SELECT * FROM generate_series(2,4);
generate_series
-----------------
           2
           3
           4
(3 rows)

SELECT * FROM generate_series(5,1,-2);                                                             
generate_series
-----------------
           5
           3
           1
(3 rows)

But when I try,

select * from generate_series('2011-12-31'::timestamp, '2012-12-31'::timestamp, '1 day');

It generated error.

ERROR:  function generate_series(timestamp without time zone, timestamp without time zone, "unknown") does not exist
HINT:  No function matches the given name and argument types. You may need to add explicit type casts.

I use PostgreSQL 8.0.2 on Redshift 1.0.757.
Any idea why it happens?

UPDATE:

generate_series is working with Redshift now.

SELECT CURRENT_DATE::TIMESTAMP  - (i * interval '1 day') as date_datetime 
FROM generate_series(1,31) i 
ORDER BY 1

This will generate last 30 days date

like image 343
DJo Avatar asked Mar 21 '14 09:03

DJo


People also ask

How do you create a sequence of dates in Redshift?

In Redshift, when we need a sequence of dates between two given days, we can create it using the generate_series function and use it as a table in a FROM or JOIN clause. It is useful when we need to display a table of dates and values, but we don't have a value for each of those days.

How do I remove duplicates in Redshift?

You can remove the duplicate records in Redshift by creating another table using the DISTINCT keyword while selecting from the original table.

Does Redshift support window functions?

Amazon Redshift supports two types of window functions: aggregate and ranking. These are the supported aggregate functions: AVG. COUNT.

Does Redshift allow duplicate records?

The AWS Firehose guarantees “at least once” delivery, and Redshift doesn't enforce uniqueness; which can result in duplicate rows. Or, if you are using an impure transform step (e.g. spot fx rates), with “almost duplicate” rows.


1 Answers

I found a solution here for my problem of not being able to generate a time dimension table on Redshift using generate_series(). You can generate a temporary sequence by using the following SQL snippet.

with digit as (
    select 0 as d union all 
    select 1 union all select 2 union all select 3 union all
    select 4 union all select 5 union all select 6 union all
    select 7 union all select 8 union all select 9        
),
seq as (
    select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
    from digit a
        cross join
        digit b
        cross join
        digit c
        cross join
        digit d
    order by 1        
)
select (getdate()::date - seq.num)::date as "Date"
from seq;

The generate_series() function, it seems, is not supported completely on Redshift yet. If I run the SQL mentioned in the answer by DJo, it works, because the SQL runs only on the leader node. If I prepend insert into dim_time to the same SQL it doesn't work.

like image 194
Dhwani Katagade Avatar answered Oct 17 '22 20:10

Dhwani Katagade