Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery SPLIT() ignores empty values

It appears that SPLIT() treats empty values as though they don't exist at all, which yields unexpected results.

For example:

SELECT
  NTH(3, SPLIT(values, ","))
FROM
  (SELECT "a,b,,d,e" as values)

returns "d", when I would expect it to return NULL. You can see how this would be problematic for several rows of comma-delimited text. One would expect the following query to return NULL and "c", but it doesn't:

SELECT
  NTH(3, SPLIT(values, ","))
FROM
  (SELECT "a,b,,d,e" as values),
  (SELECT "a,,c,d,e" as values)

Rather, it returns "d" and "d".

Is this behavior by design, can it be altered, or is there a better way to do what I'm doing?

like image 407
Michael Ames Avatar asked Oct 20 '22 10:10

Michael Ames


2 Answers

As Mosha explains, obtaining NULLs from a SPLIT() operation is not possible. As an alternative, is getting empty spaces an option?

SELECT
  NTH(7, SPLIT(values, ","))
FROM
(SELECT REGEXP_REPLACE(values, ',,', ', ,') values FROM
  (SELECT "a,b,,d,e,,g" as values),
  (SELECT "a,,c,d,e,f,g" as values),
  (SELECT "a,,c,d,e,f,," as values),
  (SELECT "a,,c,d,e,f," as values),
)

Row f0_  
1   g    
2   g    
3        
4   null    
like image 163
Felipe Hoffa Avatar answered Oct 23 '22 00:10

Felipe Hoffa


This is By Design behavior, and it is not specific to SPLIT function, but to REPEATED fields in general. BigQuery REPEATED fields cannot store NULLs (same behavior as in protocol buffers), therefore nothing that SPLIT does can make NULLs appear inside REPEATED fields.

like image 26
Mosha Pasumansky Avatar answered Oct 23 '22 01:10

Mosha Pasumansky