I am having difficulty using a window function to "forward fill" values in Google Big Query. It appears that the IGNORE part of the query is causing the problem. Assuming the dataset below, I am trying to get the "purchase dates" values for each social security number to fill in the nulls until another validate purchase date is encountered. Basically want to keep track of the latest purchase date in relation to each activity_date.
For example, looking at SS # 000-0000-000, I would like 2016-12-16 to forward fill all of the null values between activity dates 2016-12-17 to 2016-12-22 (purchase ages 1-6) and then forward fill 2016-12-23 for purchase ages 8-10. Same scenario for the other SS # 111-1111-111.
Activity_date SS_Number First_Purchase First_Purchase_age Purchase_dates
2016-12-26 000-0000-000 2016-12-16 10 null
2016-12-25 000-0000-000 2016-12-16 9 null
2016-12-24 000-0000-000 2016-12-16 8 null
2016-12-23 000-0000-000 2016-12-16 7 2016-12-23
2016-12-22 000-0000-000 2016-12-16 6 null
2016-12-21 000-0000-000 2016-12-16 5 null
2016-12-20 000-0000-000 2016-12-16 4 null
2016-12-19 000-0000-000 2016-12-16 3 null
2016-12-18 000-0000-000 2016-12-16 2 null
2016-12-17 000-0000-000 2016-12-16 1 null
2016-12-16 000-0000-000 2016-12-16 0 2016-12-16
2016-11-26 111-1111-111 2016-11-16 10 null
2016-11-25 111-1111-111 2016-11-16 9 null
2016-11-24 111-1111-111 2016-11-16 8 null
2016-11-23 111-1111-111 2016-11-16 7 2016-11-23
2016-11-22 111-1111-111 2016-11-16 6 null
2016-11-21 111-1111-111 2016-11-16 5 null
2016-11-20 111-1111-111 2016-11-16 4 null
2016-11-19 111-1111-111 2016-11-16 3 null
2016-11-18 111-1111-111 2016-11-16 2 null
2016-11-17 111-1111-111 2016-11-16 1 null
2016-11-16 111-1111-111 2016-11-16 0 2016-11-16
Here is the query I have:
SELECT
activity_date,
ss_number,
first_purchase,
first_purchase_age,
purchase_dates,
LAST_VALUE(purchase_dates) IGNORE NULLS OVER (PARTITION BY ss_number ORDER BY activity_date DESC ROWS BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW),
FROM
[TABLE]
Below gives you what you described
#standardSQL
SELECT
activity_date,
ss_number,
first_purchase,
first_purchase_age,
purchase_dates,
MAX(purchase_dates) OVER (PARTITION BY ss_number ORDER BY activity_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS filled_purchase_dates
FROM yourTable
Above "workaround" works for your specific use case, but in other cases can require more coding, thus ability IGNORE NULLS still would be great!
So, as of IGNORE
in Window Function - see Support IGNORE NULLS/RESPECT NULLS for analytic and aggregate functions ticket. The expected syntax would be
SELECT LAST_VALUE(x IGNORE NULLS) OVER (...) FROM yourTable
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With