Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Big Query: Forward Filling: IGNORE in Window Function

I am having difficulty using a window function to "forward fill" values in Google Big Query. It appears that the IGNORE part of the query is causing the problem. Assuming the dataset below, I am trying to get the "purchase dates" values for each social security number to fill in the nulls until another validate purchase date is encountered. Basically want to keep track of the latest purchase date in relation to each activity_date.

For example, looking at SS # 000-0000-000, I would like 2016-12-16 to forward fill all of the null values between activity dates 2016-12-17 to 2016-12-22 (purchase ages 1-6) and then forward fill 2016-12-23 for purchase ages 8-10. Same scenario for the other SS # 111-1111-111.

Activity_date   SS_Number   First_Purchase  First_Purchase_age  Purchase_dates
2016-12-26  000-0000-000    2016-12-16  10  null
2016-12-25  000-0000-000    2016-12-16  9   null
2016-12-24  000-0000-000    2016-12-16  8   null
2016-12-23  000-0000-000    2016-12-16  7   2016-12-23
2016-12-22  000-0000-000    2016-12-16  6   null
2016-12-21  000-0000-000    2016-12-16  5   null
2016-12-20  000-0000-000    2016-12-16  4   null
2016-12-19  000-0000-000    2016-12-16  3   null
2016-12-18  000-0000-000    2016-12-16  2   null
2016-12-17  000-0000-000    2016-12-16  1   null
2016-12-16  000-0000-000    2016-12-16  0   2016-12-16
2016-11-26  111-1111-111    2016-11-16  10  null
2016-11-25  111-1111-111    2016-11-16  9   null
2016-11-24  111-1111-111    2016-11-16  8   null
2016-11-23  111-1111-111    2016-11-16  7   2016-11-23
2016-11-22  111-1111-111    2016-11-16  6   null
2016-11-21  111-1111-111    2016-11-16  5   null
2016-11-20  111-1111-111    2016-11-16  4   null
2016-11-19  111-1111-111    2016-11-16  3   null
2016-11-18  111-1111-111    2016-11-16  2   null
2016-11-17  111-1111-111    2016-11-16  1   null
2016-11-16  111-1111-111    2016-11-16  0   2016-11-16

Here is the query I have:

SELECT
  activity_date,
  ss_number,
  first_purchase,
  first_purchase_age,
  purchase_dates,
  LAST_VALUE(purchase_dates) IGNORE NULLS OVER (PARTITION BY ss_number ORDER BY activity_date DESC ROWS BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW),
FROM
  [TABLE]
like image 773
echoecho256 Avatar asked Dec 24 '22 21:12

echoecho256


1 Answers

Below gives you what you described

#standardSQL
SELECT
  activity_date,
  ss_number,
  first_purchase,
  first_purchase_age,
  purchase_dates,
  MAX(purchase_dates) OVER (PARTITION BY ss_number ORDER BY activity_date 
                            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS filled_purchase_dates
FROM   yourTable

Above "workaround" works for your specific use case, but in other cases can require more coding, thus ability IGNORE NULLS still would be great!
So, as of IGNORE in Window Function - see Support IGNORE NULLS/RESPECT NULLS for analytic and aggregate functions ticket. The expected syntax would be

SELECT LAST_VALUE(x IGNORE NULLS) OVER (...) FROM yourTable
like image 122
Mikhail Berlyant Avatar answered Dec 26 '22 09:12

Mikhail Berlyant