Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery FIRST_VALUE and IGNORE_NULLS - why it works this way?

My problem is to find first value from column over some window, here is example data with query:

WITH finishers AS
 (SELECT 'Bob' as name,
  TIMESTAMP '2016-10-18 2:51:45' as finish_time,
  'F30-34' as division
  UNION ALL SELECT NULL, TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
  UNION ALL SELECT 'Mary', TIMESTAMP '2016-10-18 2:59:01', 'F35-39'
  UNION ALL SELECT 'John', TIMESTAMP '2016-10-18 3:01:17', 'F35-39')
SELECT *,
 FIRST_VALUE (name IGNORE NULLS) OVER(PARTITION BY division ORDER BY finish_time) AS fastest_in_division
FROM finishers
ORDER by division

The outcome is:

Row name    finish_time             division  fastest_in_division    
1   Bob     2016-10-18 02:51:45 UTC F30-34    Bob    
2   null    2016-10-18 02:54:11 UTC F35-39    **null**   
3   Mary    2016-10-18 02:59:01 UTC F35-39    Mary   
4   John    2016-10-18 03:01:17 UTC F35-39    Mary   

While my expectation is:

Row name    finish_time             division  fastest_in_division    
1   Bob     2016-10-18 02:51:45 UTC F30-34    Bob    
2   null    2016-10-18 02:54:11 UTC F35-39    **Mary**
3   Mary    2016-10-18 02:59:01 UTC F35-39    Mary   
4   John    2016-10-18 03:01:17 UTC F35-39    Mary   

It seems that IGNORE_NULLS skips rows when 'name' is null AND is first in order - then it returns 'null' instead of 'Mary', as in other rows. Is there any way to bypass this behaviour?

like image 408
Rafał Wroński Avatar asked Oct 26 '17 19:10

Rafał Wroński


People also ask

Does First_value ignore nulls?

If the first value in the set is null, then the function returns NULL unless you specify IGNORE NULLS . This setting is useful for data densification. If you specify IGNORE NULLS , then FIRST_VALUE returns the fist non-null value in the set, or NULL if all values are null.

How do you filter out null values in BigQuery?

The syntax is as follows: Select * from table_source where column is not NULL; If you want to read more about the where operator, please refer to the documentation. In addition if you want to replace the null values, you can use the IFNNULL() function.

How do I check if a value is null in BigQuery?

If expr evaluates to NULL , returns null_result . Otherwise, returns expr . If expr doesn't evaluate to NULL , null_result isn't evaluated. expr and null_result can be any type and must be implicitly coercible to a common supertype.

Which aggregate function will concatenate values and ignore null values?

The GROUP_CONCAT() function is an aggregate function that concatenates all non-null values in a column. In this syntax: expression is a column or an expression that will be used for concatenation.


1 Answers

To achive your expectation the query should look like below

#standardSQL
WITH finishers AS  (
  SELECT 'Bob' AS name, TIMESTAMP '2016-10-18 2:51:45' AS finish_time, 'F30-34' AS division UNION ALL 
  SELECT NULL, TIMESTAMP '2016-10-18 2:54:11', 'F35-39' UNION ALL 
  SELECT 'Mary', TIMESTAMP '2016-10-18 2:59:01', 'F35-39' UNION ALL 
  SELECT 'John', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
)
SELECT *,
  FIRST_VALUE (name IGNORE NULLS) 
    OVER(PARTITION BY division ORDER BY finish_time 
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS fastest_in_division
FROM finishers
ORDER BY finish_time, division

with result as you expected:

Row name    finish_time             division    fastest_in_division  
1   Bob     2016-10-18 02:51:45 UTC F30-34      Bob  
2   null    2016-10-18 02:54:11 UTC F35-39      Mary     
3   Mary    2016-10-18 02:59:01 UTC F35-39      Mary     
4   John    2016-10-18 03:01:17 UTC F35-39      Mary

The problem you had because by default - the scope of OVER with ORDER BY is BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW in respective partition based on given order, but looks like you expect the whole partition to be involved

like image 104
Mikhail Berlyant Avatar answered Oct 14 '22 06:10

Mikhail Berlyant