Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate multiple time series in one sql query?

Tags:

sql

r

postgresql

Here is the database layout. I have a table with sparse sales over time, aggregated per day. If for an item I have 10 sales on the 01-01-2015, I will have an entry, but If I have 0, then I have no entry. Something like this.

|--------------------------------------|
| day_of_year | year | sales | item_id |
|--------------------------------------|
|      01     | 2015 |  20   |   A1    |
|      01     | 2015 |  11   |   A2    | 
|      07     | 2015 |  09   |   A1    | 
|     ...     | ...  |  ...  |  ...    | 
|--------------------------------------|

This is how I get a time series for 1 item.

SELECT doy, max(sales) FROM (
    SELECT day_of_year AS doy,
           sales       AS sales
      FROM myschema.entry_daily
     WHERE item_id = theNameOfmyItem
       AND year = 2015
       AND day_of_year < 150
     UNION
    SELECT doy AS doy,
           0   AS sales
      FROM generate_series(1, 149) AS doy) as t
GROUP BY doy
ORDER BY doy;

And I currently loop with R making 1 query for every item. I then aggregate the results in a dataframe. But this is very slow. I would actually like to have only one query that would aggregate all the data in the following form.

|----------------------------------------------|
| item_id | 01 | 02 | 03 | 04 | 05 | ... | 149 |
|----------------------------------------------|
|    A1   | 10 | 00 | 00 | 05 | 12 | ... |  11 |
|    A2   | 11 | 00 | 30 | 01 | 15 | ... |  09 |
|    A3   | 20 | 00 | 00 | 05 | 17 | ... |  20 |
|                       ...                    |
|----------------------------------------------|

Would this be possible? By the way I am using a Postgres database.

like image 336
Paul Fournel Avatar asked Oct 24 '15 18:10

Paul Fournel


2 Answers

First you need a table with all dates to fill the blank dates. 100 years of date mean 36,000 rows so no very big. Instead of calculate every time.

allDates:

date_id
s_date

or created calculating the fields

date_id
s_date
doy = EXTRACT(DOY FROM s_date)
year = EXTRACT(YEAR FROM s_date)

Your base query will be SQL FIDDLE DEMO:

SELECT           
      AD.year,
      AD.doy,           
      allitems.item_id,
      COALESCE(SUM(ED.sales), 0) as max_sales
FROM 
    (SELECT DISTINCT item_id
     FROM entry_daily 
    ) as allitems
CROSS JOIN alldates AD
LEFT JOIN entry_daily ED
       ON ED.day_of_year = AD.doy
      AND ED.year = AD.year  
      AND ED.item_id = allitems.item_id
WHERE AD.year = 2015
GROUP BY
     AD.year, AD.doy, allitems.item_id
ORDER BY 
     AD.year, AD.doy, allitems.item_id

You will have this OUTPUT

| year | doy | item_id | max_sales |
|------|-----|---------|-----------|
| 2015 |   1 |      A1 |        20 |
| 2015 |   1 |      A2 |        11 |
| 2015 |   2 |      A1 |         0 |
| 2015 |   2 |      A2 |         0 |
| 2015 |   3 |      A1 |         0 |
| 2015 |   3 |      A2 |         0 |
| 2015 |   4 |      A1 |         0 |
| 2015 |   4 |      A2 |         0 |
| 2015 |   5 |      A1 |         0 |
| 2015 |   5 |      A2 |         0 |
| 2015 |   6 |      A1 |         0 |
| 2015 |   6 |      A2 |         0 |
| 2015 |   7 |      A1 |        39 |
| 2015 |   7 |      A2 |         0 |
| 2015 |   8 |      A1 |         0 |
| 2015 |   8 |      A2 |         0 |
| 2015 |   9 |      A1 |         0 |
| 2015 |   9 |      A2 |         0 |
| 2015 |  10 |      A1 |         0 |
| 2015 |  10 |      A2 |         0 |

Then you need install tablefunc

and use crosstab to pivot this table SAMPLE

like image 25
Juan Carlos Oropeza Avatar answered Oct 23 '22 14:10

Juan Carlos Oropeza


Solution 1. Simple query with an aggregate.

The simplest and fastest way to get the expected result. It is easy to parse the sales column within a client program.

select item, string_agg(coalesce(sales, 0)::text, ',') sales
from (
    select distinct item_id item, doy
    from generate_series (1, 10) doy  -- change 10 to given n
    cross join entry_daily
    ) sub
left join entry_daily on item_id = item and day_of_year = doy
group by 1
order by 1;

 item |        sales         
------+----------------------
 A1   | 20,0,0,0,0,0,9,0,0,0
 A2   | 11,0,0,0,0,0,0,0,0,0
(2 rows)

Solution 2. Dynamically created view.

Based on the solution 1 with array_agg() instead of string_agg(). The function creates a view with a given number of columns.

create or replace function create_items_view(view_name text, days int)
returns void language plpgsql as $$
declare
    list text;
begin
    select string_agg(format('s[%s] "%s"', i::text, i::text), ',')
    into list
    from generate_series(1, days) i;

    execute(format($f$
        drop view if exists %s;
        create view %s as select item, %s
        from (
            select item, array_agg(coalesce(sales, 0)) s
            from (
                select distinct item_id item, doy
                from generate_series (1, %s) doy
                cross join entry_daily
                ) sub
            left join entry_daily on item_id = item and day_of_year = doy
            group by 1
            order by 1
        ) q
        $f$, view_name, view_name, list, days)
    );
end $$;

Usage:

select create_items_view('items_view_10', 10);

select * from items_view_10;

 item | 1  | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 
------+----+---+---+---+---+---+---+---+---+----
 A1   | 20 | 0 | 0 | 0 | 0 | 0 | 9 | 0 | 0 |  0
 A2   | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  0
(2 rows)

Solution 3. Crosstab.

Easy to use, but very uncomfortable with the greater number of columns due to the need to define the row format.

create extension if not exists tablefunc;

select * from crosstab (
    'select item_id, day_of_year, sales
    from entry_daily
    order by 1',
    'select i from generate_series (1, 10) i'
) as ct 
(item_id text, "1" int, "2" int, "3" int, "4" int, "5" int, "6" int, "7" int, "8" int, "9" int, "10" int);

 item_id | 1  | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 
---------+----+---+---+---+---+---+---+---+---+----
 A1      | 20 |   |   |   |   |   | 9 |   |   |   
 A2      | 11 |   |   |   |   |   |   |   |   |   
(2 rows)
like image 64
klin Avatar answered Oct 23 '22 12:10

klin