Here is the database layout. I have a table with sparse sales over time, aggregated per day. If for an item I have 10 sales on the 01-01-2015, I will have an entry, but If I have 0, then I have no entry. Something like this.
|--------------------------------------|
| day_of_year | year | sales | item_id |
|--------------------------------------|
| 01 | 2015 | 20 | A1 |
| 01 | 2015 | 11 | A2 |
| 07 | 2015 | 09 | A1 |
| ... | ... | ... | ... |
|--------------------------------------|
This is how I get a time series for 1 item.
SELECT doy, max(sales) FROM (
SELECT day_of_year AS doy,
sales AS sales
FROM myschema.entry_daily
WHERE item_id = theNameOfmyItem
AND year = 2015
AND day_of_year < 150
UNION
SELECT doy AS doy,
0 AS sales
FROM generate_series(1, 149) AS doy) as t
GROUP BY doy
ORDER BY doy;
And I currently loop with R making 1 query for every item. I then aggregate the results in a dataframe. But this is very slow. I would actually like to have only one query that would aggregate all the data in the following form.
|----------------------------------------------|
| item_id | 01 | 02 | 03 | 04 | 05 | ... | 149 |
|----------------------------------------------|
| A1 | 10 | 00 | 00 | 05 | 12 | ... | 11 |
| A2 | 11 | 00 | 30 | 01 | 15 | ... | 09 |
| A3 | 20 | 00 | 00 | 05 | 17 | ... | 20 |
| ... |
|----------------------------------------------|
Would this be possible? By the way I am using a Postgres database.
First you need a table with all dates to fill the blank dates. 100 years of date mean 36,000 rows so no very big. Instead of calculate every time.
allDates:
date_id
s_date
or created calculating the fields
date_id
s_date
doy = EXTRACT(DOY FROM s_date)
year = EXTRACT(YEAR FROM s_date)
Your base query will be SQL FIDDLE DEMO:
SELECT
AD.year,
AD.doy,
allitems.item_id,
COALESCE(SUM(ED.sales), 0) as max_sales
FROM
(SELECT DISTINCT item_id
FROM entry_daily
) as allitems
CROSS JOIN alldates AD
LEFT JOIN entry_daily ED
ON ED.day_of_year = AD.doy
AND ED.year = AD.year
AND ED.item_id = allitems.item_id
WHERE AD.year = 2015
GROUP BY
AD.year, AD.doy, allitems.item_id
ORDER BY
AD.year, AD.doy, allitems.item_id
You will have this OUTPUT
| year | doy | item_id | max_sales |
|------|-----|---------|-----------|
| 2015 | 1 | A1 | 20 |
| 2015 | 1 | A2 | 11 |
| 2015 | 2 | A1 | 0 |
| 2015 | 2 | A2 | 0 |
| 2015 | 3 | A1 | 0 |
| 2015 | 3 | A2 | 0 |
| 2015 | 4 | A1 | 0 |
| 2015 | 4 | A2 | 0 |
| 2015 | 5 | A1 | 0 |
| 2015 | 5 | A2 | 0 |
| 2015 | 6 | A1 | 0 |
| 2015 | 6 | A2 | 0 |
| 2015 | 7 | A1 | 39 |
| 2015 | 7 | A2 | 0 |
| 2015 | 8 | A1 | 0 |
| 2015 | 8 | A2 | 0 |
| 2015 | 9 | A1 | 0 |
| 2015 | 9 | A2 | 0 |
| 2015 | 10 | A1 | 0 |
| 2015 | 10 | A2 | 0 |
Then you need install tablefunc
and use crosstab to pivot this table SAMPLE
The simplest and fastest way to get the expected result. It is easy to parse the sales
column within a client program.
select item, string_agg(coalesce(sales, 0)::text, ',') sales
from (
select distinct item_id item, doy
from generate_series (1, 10) doy -- change 10 to given n
cross join entry_daily
) sub
left join entry_daily on item_id = item and day_of_year = doy
group by 1
order by 1;
item | sales
------+----------------------
A1 | 20,0,0,0,0,0,9,0,0,0
A2 | 11,0,0,0,0,0,0,0,0,0
(2 rows)
Based on the solution 1 with array_agg()
instead of string_agg()
. The function creates a view with a given number of columns.
create or replace function create_items_view(view_name text, days int)
returns void language plpgsql as $$
declare
list text;
begin
select string_agg(format('s[%s] "%s"', i::text, i::text), ',')
into list
from generate_series(1, days) i;
execute(format($f$
drop view if exists %s;
create view %s as select item, %s
from (
select item, array_agg(coalesce(sales, 0)) s
from (
select distinct item_id item, doy
from generate_series (1, %s) doy
cross join entry_daily
) sub
left join entry_daily on item_id = item and day_of_year = doy
group by 1
order by 1
) q
$f$, view_name, view_name, list, days)
);
end $$;
Usage:
select create_items_view('items_view_10', 10);
select * from items_view_10;
item | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10
------+----+---+---+---+---+---+---+---+---+----
A1 | 20 | 0 | 0 | 0 | 0 | 0 | 9 | 0 | 0 | 0
A2 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
(2 rows)
Easy to use, but very uncomfortable with the greater number of columns due to the need to define the row format.
create extension if not exists tablefunc;
select * from crosstab (
'select item_id, day_of_year, sales
from entry_daily
order by 1',
'select i from generate_series (1, 10) i'
) as ct
(item_id text, "1" int, "2" int, "3" int, "4" int, "5" int, "6" int, "7" int, "8" int, "9" int, "10" int);
item_id | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10
---------+----+---+---+---+---+---+---+---+---+----
A1 | 20 | | | | | | 9 | | |
A2 | 11 | | | | | | | | |
(2 rows)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With