How to ignore nulls in BigQuery using LAG()?

Tags:

When using LAG() (in BigQuery standard SQL), how do you skip NULL values so that it takes the first preceding value which is not NULL?

I've prepared some sample rows in the same format in the source table, but obfuscated. In the example, it only works for rows that do not have a preceding NULL value. Specifically, rows 3 & 4 should be assigned '2017-01-25 04:02:36' (as was the case for row 5), but they are NULL.

This makes sense. However, surely there is an easy way to do specify something like INGORE_NULLS?

--TEMP
with example as (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 03:19:50') as col_c, 'val_1' as col_d 
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 03:19:50') as col_c, 'val_2' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 03:19:50') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:01:23') as col_c, 'val_1' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:01:23') as col_c, 'val_2' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:01:23') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:01:59') as col_c, 'val_1' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:01:59') as col_c, 'val_2' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:01:59') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:02:36') as col_c, 'val_1' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:02:36') as col_c, 'val_2' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:02:36') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:02:55') as col_c, 'val_1' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:02:55') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 07:16:58') as col_c, 'val_1' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 07:16:58') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 09:35:39') as col_c, 'val_1' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 09:35:39') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 09:46:48') as col_c, 'val_1' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 09:46:48') as col_c, 'val_2' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 09:46:48') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 10:47:48') as col_c, 'val_2' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 10:47:48') as col_c, 'val_3' as col_d))
--TEMP
SELECT col_a, col_b, col_c,
  case when val_1_transposed is null then LAG(val_1_transposed) over (order by col_c) else val_1_transposed end as val_1_transposed,
  case when val_2_transposed is null then LAG(val_2_transposed) over (order by col_c) else val_2_transposed end as val_2_transposed,
  case when val_3_transposed is null then LAG(val_3_transposed) over (order by col_c) else val_3_transposed end as val_3_transposed
FROM (
  SELECT col_a, col_b, col_c,
    MAX(IF(col_d = 'val_1', col_c, NULL)) AS val_1_transposed,
    MAX(IF(col_d = 'val_2', col_c, NULL)) AS val_2_transposed,
    MAX(IF(col_d = 'val_3', col_c, NULL)) AS val_3_transposed
  FROM (
    SELECT col_a, col_b, col_c, col_d FROM example) GROUP BY 1,2,3) ORDER BY col_c DESC

enter image description here

976

asked Apr 24 '17 10:04

Graham Polley

1 Answers

There are two solutions, described in detail here: http://sqlmag.com/t-sql/last-non-null-puzzle

I adapted one of the solutions, basically it uses a MAX window aggregate function to return the maximum relevant id so far. By using ROWS UNBOUNDED PRECEDING you can constantly reach new MAX levels, than are carried over and replace the NULL lag entries.

--TEMP
with example as (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 03:19:50') as col_c, 'val_1' as col_d 
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 03:19:50') as col_c, 'val_2' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 03:19:50') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:01:23') as col_c, 'val_1' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:01:23') as col_c, 'val_2' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:01:23') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:01:59') as col_c, 'val_1' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:01:59') as col_c, 'val_2' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:01:59') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:02:36') as col_c, 'val_1' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:02:36') as col_c, 'val_2' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:02:36') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:02:55') as col_c, 'val_1' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 04:02:55') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 07:16:58') as col_c, 'val_1' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 07:16:58') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 09:35:39') as col_c, 'val_1' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 09:35:39') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 09:46:48') as col_c, 'val_1' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 09:46:48') as col_c, 'val_2' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 09:46:48') as col_c, 'val_3' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 10:47:48') as col_c, 'val_2' as col_d)
UNION ALL (select 'some_id' as col_a, 'foo' as col_b, timestamp('2017-01-25 10:47:48') as col_c, 'val_3' as col_d))
--TEMP
SELECT col_a, col_b, col_c,
  case when val_1_transposed is null then LAG(val_1_transposed) over (order by col_c) else val_1_transposed end as val_1_transposed,
  case when val_2_transposed is null then LAG(val_2_transposed) over (order by col_c) else val_2_transposed end as val_2_transposed,
  case when val_3_transposed is null then LAG(val_3_transposed) over (order by col_c) else val_3_transposed end as val_3_transposed,
  MAX(val_2_transposed) OVER( PARTITION BY grp ORDER BY col_a ROWS UNBOUNDED PRECEDING ) as lag_ignored_nulls
FROM (
select *, 
  MAX(CASE WHEN val_2_transposed IS NOT NULL THEN col_a END ) OVER( ORDER BY col_a ROWS UNBOUNDED PRECEDING ) AS grp
            from (
  SELECT col_a, col_b, col_c,
    MAX(IF(col_d = 'val_1', col_c, NULL)) AS val_1_transposed,
    MAX(IF(col_d = 'val_2', col_c, NULL)) AS val_2_transposed,
    MAX(IF(col_d = 'val_3', col_c, NULL)) AS val_3_transposed
  FROM (
    SELECT col_a, col_b, col_c, col_d FROM example) GROUP BY 1,2,3)) ORDER BY col_c DESC

enter image description here

answered Sep 25 '22 11:09

Pentium10

Related questions
                            
                                Expose multiple backends with multiple IPs with Kubernetes Ingress resources
                            
                                The TensorFlow library wasn't compiled to use SSE3, SSE4.1, SSE4.2, AVX on Google Cloud Platform Console
                            
                                How to access GCS bucket objects across projects using service account?
                            
                                Firebase catch exception
                            
                                Converting proto buffer to ProtoRPC
                            
                                Get shorter file URL from Google Cloud Storage (with Firebase Cloud Functions)
                            
                                Get Firebase analytics collection enable status
                            
                                GKE cluster autoscaler vs Autoscaling in Managed instance groups
                            
                                Firebase promise - Query.once failed: Was called with 1 argument. Expects at least 2
                            
                                Iterate over the same list twice in Jinja2?
                            
                                In a nutshell what's the difference from using OAuth2 request getAuthToken and getToken
                            
                                GCM Register SERVICE_NOT_AVAILABLE
                            
                                Understanding global object persistence in Python WSGI apps
                            
                                Application name is not set. Call Builder#setApplicationName. error
                            
                                How do I keep a user logged into a firebase app after refresh?
                            
                                Warning : The page index.html ran insecure content

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With