I know there is an AVG function in Big Query and there are window functions to shift previous or next values up or down a row, but is there any function that allows you to average over a specified interval? For instance I'd like to so something like the following:
SELECT
city
AVG(temperature) OVER(PARTITION BY city, INTERVAL day,14, ORDER BY day) as rolling_avg_14_days,
AVG(temperature) OVER(PARTITION BY city, INTERVAL day,30, ORDER BY day) as rolling_avg_30_days,
WHERE
city IN ("Los Angeles","Chicago","Sun Prairie","Sunnyvale")
AND year BETWEEN 1900 AND 2013
I'd like to do a rolling average calculation that allows me to specify a range of values to do an aggregation function over, and what value to order by. The average function would take the current day temp and previous 13 days (or previous 29) to calculate and average. Is this possible today? I know I could do something like this if I put 13 LAG/OVER fields in the SELECT statement and then avg the results of all of them, but that is a lot of overhead.
The second part of the query (in black text) is the calculation of the rolling average. Similarly to the first example, we use the AVG() window function and the clause OVER(ORDER BY day ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) . This applies the AVG() function to the current row and the nine rows before it.
A rolling average continuously updates the average of a data set to include all the data in the set until that point. For example, the rolling average of return quantities at March 2012 would be calculated by adding the return quantities in January, February, and March, and then dividing that sum by three.
A moving average means that it takes the past days of numbers, takes the average of those days, and plots it on the graph. For a 7-day moving average, it takes the last 7 days, adds them up, and divides it by 7. For a 14-day average, it will take the past 14 days.
To do so, we calculate the average of the stock prices from three consecutive days—the day in question and the two previous days—then repeat the same for each day in the data set. This is a three-day moving average, because we average over a period of three days.
I think OVER with RANGE
construction of Window Functions is best fit here
Assuming that day field is represented as 'YYYY-MM-DD' format, below query does the rolling averages
SELECT
city,
day,
AVG(temperature) OVER(PARTITION BY city ORDER BY ts
RANGE BETWEEN 14*24*3600 PRECEDING AND CURRENT ROW) AS rolling_avg_14_days,
AVG(temperature) OVER(PARTITION BY city ORDER BY ts
RANGE BETWEEN 30*24*3600 PRECEDING AND CURRENT ROW) AS rolling_avg_30_days
FROM (
SELECT day, city, temperature, TIMESTAMP_TO_SEC(TIMESTAMP(day)) AS ts
FROM temperatures
)
You most likely already found this solution long ago, but still wanted to have what i think a better answer (as of today) here for this question
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With