It appears that SPLIT() treats empty values as though they don't exist at all, which yields unexpected results.
For example:
SELECT
NTH(3, SPLIT(values, ","))
FROM
(SELECT "a,b,,d,e" as values)
returns "d", when I would expect it to return NULL. You can see how this would be problematic for several rows of comma-delimited text. One would expect the following query to return NULL and "c", but it doesn't:
SELECT
NTH(3, SPLIT(values, ","))
FROM
(SELECT "a,b,,d,e" as values),
(SELECT "a,,c,d,e" as values)
Rather, it returns "d" and "d".
Is this behavior by design, can it be altered, or is there a better way to do what I'm doing?
As Mosha explains, obtaining NULLs from a SPLIT() operation is not possible. As an alternative, is getting empty spaces an option?
SELECT
NTH(7, SPLIT(values, ","))
FROM
(SELECT REGEXP_REPLACE(values, ',,', ', ,') values FROM
(SELECT "a,b,,d,e,,g" as values),
(SELECT "a,,c,d,e,f,g" as values),
(SELECT "a,,c,d,e,f,," as values),
(SELECT "a,,c,d,e,f," as values),
)
Row f0_
1 g
2 g
3
4 null
This is By Design behavior, and it is not specific to SPLIT function, but to REPEATED fields in general. BigQuery REPEATED fields cannot store NULLs (same behavior as in protocol buffers), therefore nothing that SPLIT does can make NULLs appear inside REPEATED fields.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With