I have a table of hourly product usage (how many times the product is used) data –
ID (bigint)| ProductId (tinyint)| Date (int - YYYYMMDD) | Hour (tinyint)| UsageCount (int) #|1 | 20140901 | 0 | 10 #|1 | 20140901 | 1 | 15 #|1 | 20140902 | 5 | 25 #|1 | 20140903 | 5 | 25 #|1 | 20140904 | 3 | 25 #|1 | 20140905 | 7 | 25 #|1 | 20140906 | 10 | 25 #|1 | 20140907 | 9 | 25 #|1 | 20140908 | 5 | 25 #|2 | 20140903 | 16 | 10 #|2 | 20140903 | 13 | 115
Likewise, I have the usage data for 4 different products (ProductId from 1 through 4) stored for every hour in the product_usage table. As you can imagine, it is constantly growing as the nightly ETL process dumps the data for the entire previous day. If a product is not used on any hour of a day, the record for that hour won’t appear in this table. Similarly, if a product is not used for the entire day, there won’t be any record for that day in the table. I need to generate a report that gives daily usage and last 7 days’ rolling average –
For example:
ProductId | Date | DailyUsage | RollingAverage 1 | 20140901 | sum of usages of that day | (Sum of usages from 20140901 through 20140826) / 7 1 | 20140901 | sum of usages of that day | (Sum of usages from 20140901 through 20140826) / 7 1 | 20140902 | sum of usages of that day | (Sum of usages from 20140902 through 20140827) / 7 2 | 20140902 | sum of usages of that day | (Sum of usages from 20140902 through 20140827) / 7
And so on.. I am planning to create an Indexed View in SQL server 2014. Can you think of an efficient SQL query to do this?
The second part of the query (in black text) is the calculation of the rolling average. Similarly to the first example, we use the AVG() window function and the clause OVER(ORDER BY day ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) . This applies the AVG() function to the current row and the nine rows before it.
For a 7-day moving average, it takes the last 7 days, adds them up, and divides it by 7. For a 14-day average, it will take the past 14 days. So, for example, we have data on COVID starting March 12. For the 7-day moving average, it needs 7 days of COVID cases: that is the reason it only starts on March 19.
To do so, we calculate the average of the stock prices from three consecutive days—the day in question and the two previous days—then repeat the same for each day in the data set. This is a three-day moving average, because we average over a period of three days.
Try:
select x.*, avg(dailyusage) over(partition by productid order by productid, date rows between 6 preceding and current row) as rolling_avg from (select productid, date, sum(usagecount) as dailyusage from tbl group by productid, date) x
Fiddle:
http://sqlfiddle.com/#!6/f674a7/4/0
Replace "avg(dailusage) over...." with sum (rather than avg) if what you really want is the sum for the past week. In your title you say you want the average but later you say you want the sum. The query should be the same other than that, so use whichever you actually want.
As was pointed out by Gordon this is basically the average of the past 6 dates in which the product was used, which might be more than just the past 6 days if there are days without any rows for that product on the table because it wasn't used at all. To get around that you could use a date table and your products table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With