My problem is to find first value from column over some window, here is example data with query:
WITH finishers AS
(SELECT 'Bob' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT NULL, TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Mary', TIMESTAMP '2016-10-18 2:59:01', 'F35-39'
UNION ALL SELECT 'John', TIMESTAMP '2016-10-18 3:01:17', 'F35-39')
SELECT *,
FIRST_VALUE (name IGNORE NULLS) OVER(PARTITION BY division ORDER BY finish_time) AS fastest_in_division
FROM finishers
ORDER by division
The outcome is:
Row name finish_time division fastest_in_division
1 Bob 2016-10-18 02:51:45 UTC F30-34 Bob
2 null 2016-10-18 02:54:11 UTC F35-39 **null**
3 Mary 2016-10-18 02:59:01 UTC F35-39 Mary
4 John 2016-10-18 03:01:17 UTC F35-39 Mary
While my expectation is:
Row name finish_time division fastest_in_division
1 Bob 2016-10-18 02:51:45 UTC F30-34 Bob
2 null 2016-10-18 02:54:11 UTC F35-39 **Mary**
3 Mary 2016-10-18 02:59:01 UTC F35-39 Mary
4 John 2016-10-18 03:01:17 UTC F35-39 Mary
It seems that IGNORE_NULLS skips rows when 'name' is null AND is first in order - then it returns 'null' instead of 'Mary', as in other rows. Is there any way to bypass this behaviour?
If the first value in the set is null, then the function returns NULL unless you specify IGNORE NULLS . This setting is useful for data densification. If you specify IGNORE NULLS , then FIRST_VALUE returns the fist non-null value in the set, or NULL if all values are null.
The syntax is as follows: Select * from table_source where column is not NULL; If you want to read more about the where operator, please refer to the documentation. In addition if you want to replace the null values, you can use the IFNNULL() function.
If expr evaluates to NULL , returns null_result . Otherwise, returns expr . If expr doesn't evaluate to NULL , null_result isn't evaluated. expr and null_result can be any type and must be implicitly coercible to a common supertype.
The GROUP_CONCAT() function is an aggregate function that concatenates all non-null values in a column. In this syntax: expression is a column or an expression that will be used for concatenation.
To achive your expectation the query should look like below
#standardSQL
WITH finishers AS (
SELECT 'Bob' AS name, TIMESTAMP '2016-10-18 2:51:45' AS finish_time, 'F30-34' AS division UNION ALL
SELECT NULL, TIMESTAMP '2016-10-18 2:54:11', 'F35-39' UNION ALL
SELECT 'Mary', TIMESTAMP '2016-10-18 2:59:01', 'F35-39' UNION ALL
SELECT 'John', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
)
SELECT *,
FIRST_VALUE (name IGNORE NULLS)
OVER(PARTITION BY division ORDER BY finish_time
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS fastest_in_division
FROM finishers
ORDER BY finish_time, division
with result as you expected:
Row name finish_time division fastest_in_division
1 Bob 2016-10-18 02:51:45 UTC F30-34 Bob
2 null 2016-10-18 02:54:11 UTC F35-39 Mary
3 Mary 2016-10-18 02:59:01 UTC F35-39 Mary
4 John 2016-10-18 03:01:17 UTC F35-39 Mary
The problem you had because by default - the scope of OVER with ORDER BY is BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW in respective partition based on given order, but looks like you expect the whole partition to be involved
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With