How to generate multiple time series in one sql query?

Question

Here is the database layout. I have a table with sparse sales over time, aggregated per day. If for an item I have 10 sales on the 01-01-2015, I will have an entry, but If I have 0, then I have no entry. Something like this.

|--------------------------------------|
| day_of_year | year | sales | item_id |
|--------------------------------------|
|      01     | 2015 |  20   |   A1    |
|      01     | 2015 |  11   |   A2    | 
|      07     | 2015 |  09   |   A1    | 
|     ...     | ...  |  ...  |  ...    | 
|--------------------------------------|

This is how I get a time series for 1 item.

SELECT doy, max(sales) FROM (
    SELECT day_of_year AS doy,
           sales       AS sales
      FROM myschema.entry_daily
     WHERE item_id = theNameOfmyItem
       AND year = 2015
       AND day_of_year < 150
     UNION
    SELECT doy AS doy,
           0   AS sales
      FROM generate_series(1, 149) AS doy) as t
GROUP BY doy
ORDER BY doy;

And I currently loop with R making 1 query for every item. I then aggregate the results in a dataframe. But this is very slow. I would actually like to have only one query that would aggregate all the data in the following form.

|----------------------------------------------|
| item_id | 01 | 02 | 03 | 04 | 05 | ... | 149 |
|----------------------------------------------|
|    A1   | 10 | 00 | 00 | 05 | 12 | ... |  11 |
|    A2   | 11 | 00 | 30 | 01 | 15 | ... |  09 |
|    A3   | 20 | 00 | 00 | 05 | 17 | ... |  20 |
|                       ...                    |
|----------------------------------------------|

Would this be possible? By the way I am using a Postgres database.

Juan Carlos Oropeza · Accepted Answer

First you need a table with all dates to fill the blank dates. 100 years of date mean 36,000 rows so no very big. Instead of calculate every time.

allDates:

date_id
s_date

or created calculating the fields

date_id
s_date
doy = EXTRACT(DOY FROM s_date)
year = EXTRACT(YEAR FROM s_date)

Your base query will be SQL FIDDLE DEMO:

SELECT           
      AD.year,
      AD.doy,           
      allitems.item_id,
      COALESCE(SUM(ED.sales), 0) as max_sales
FROM 
    (SELECT DISTINCT item_id
     FROM entry_daily 
    ) as allitems
CROSS JOIN alldates AD
LEFT JOIN entry_daily ED
       ON ED.day_of_year = AD.doy
      AND ED.year = AD.year  
      AND ED.item_id = allitems.item_id
WHERE AD.year = 2015
GROUP BY
     AD.year, AD.doy, allitems.item_id
ORDER BY 
     AD.year, AD.doy, allitems.item_id

You will have this OUTPUT

| year | doy | item_id | max_sales |
|------|-----|---------|-----------|
| 2015 |   1 |      A1 |        20 |
| 2015 |   1 |      A2 |        11 |
| 2015 |   2 |      A1 |         0 |
| 2015 |   2 |      A2 |         0 |
| 2015 |   3 |      A1 |         0 |
| 2015 |   3 |      A2 |         0 |
| 2015 |   4 |      A1 |         0 |
| 2015 |   4 |      A2 |         0 |
| 2015 |   5 |      A1 |         0 |
| 2015 |   5 |      A2 |         0 |
| 2015 |   6 |      A1 |         0 |
| 2015 |   6 |      A2 |         0 |
| 2015 |   7 |      A1 |        39 |
| 2015 |   7 |      A2 |         0 |
| 2015 |   8 |      A1 |         0 |
| 2015 |   8 |      A2 |         0 |
| 2015 |   9 |      A1 |         0 |
| 2015 |   9 |      A2 |         0 |
| 2015 |  10 |      A1 |         0 |
| 2015 |  10 |      A2 |         0 |

Then you need install tablefunc

and use crosstab to pivot this table SAMPLE

klin · Answer

Solution 1. Simple query with an aggregate.

The simplest and fastest way to get the expected result. It is easy to parse the sales column within a client program.

select item, string_agg(coalesce(sales, 0)::text, ',') sales
from (
    select distinct item_id item, doy
    from generate_series (1, 10) doy  -- change 10 to given n
    cross join entry_daily
    ) sub
left join entry_daily on item_id = item and day_of_year = doy
group by 1
order by 1;

 item |        sales         
------+----------------------
 A1   | 20,0,0,0,0,0,9,0,0,0
 A2   | 11,0,0,0,0,0,0,0,0,0
(2 rows)

Solution 2. Dynamically created view.

Based on the solution 1 with array_agg() instead of string_agg(). The function creates a view with a given number of columns.

create or replace function create_items_view(view_name text, days int)
returns void language plpgsql as $$
declare
    list text;
begin
    select string_agg(format('s[%s] "%s"', i::text, i::text), ',')
    into list
    from generate_series(1, days) i;

    execute(format($f$
        drop view if exists %s;
        create view %s as select item, %s
        from (
            select item, array_agg(coalesce(sales, 0)) s
            from (
                select distinct item_id item, doy
                from generate_series (1, %s) doy
                cross join entry_daily
                ) sub
            left join entry_daily on item_id = item and day_of_year = doy
            group by 1
            order by 1
        ) q
        $f$, view_name, view_name, list, days)
    );
end $$;

Usage:

select create_items_view('items_view_10', 10);

select * from items_view_10;

 item | 1  | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 
------+----+---+---+---+---+---+---+---+---+----
 A1   | 20 | 0 | 0 | 0 | 0 | 0 | 9 | 0 | 0 |  0
 A2   | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  0
(2 rows)

Solution 3. Crosstab.

Easy to use, but very uncomfortable with the greater number of columns due to the need to define the row format.

create extension if not exists tablefunc;

select * from crosstab (
    'select item_id, day_of_year, sales
    from entry_daily
    order by 1',
    'select i from generate_series (1, 10) i'
) as ct 
(item_id text, "1" int, "2" int, "3" int, "4" int, "5" int, "6" int, "7" int, "8" int, "9" int, "10" int);

 item_id | 1  | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 
---------+----+---+---+---+---+---+---+---+---+----
 A1      | 20 |   |   |   |   |   | 9 |   |   |   
 A2      | 11 |   |   |   |   |   |   |   |   |   
(2 rows)

How to generate multiple time series in one sql query?

Tags:

sql

r

postgresql

Paul Fournel

2 Answers

Juan Carlos Oropeza

Solution 1. Simple query with an aggregate.

Solution 2. Dynamically created view.

Solution 3. Crosstab.

klin

Recent Activity

Donate For Us

How to generate multiple time series in one sql query?

Tags:

sql

r

postgresql

Paul Fournel

2 Answers

Juan Carlos Oropeza

Solution 1. Simple query with an aggregate.

Solution 2. Dynamically created view.

Solution 3. Crosstab.

klin

Related questions

Recent Activity

Donate For Us