Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get a slice of an array in BigQuery Standard SQL?

In BigQuery, I have a table with a path column like this:

ID .     | Path
---------+----------------------------------------
1        | foo/bar/baz
2        | foo/bar/quux/blat

I would like to be able to split the path on forward slash (/) and select one or more path parts, rejoining them.

In PostgreSQL, this is easy:

select array_to_string((regexp_split_to_array(path, '/'))[1:3], '/')

But BigQuery doesn't seem to have any kind of range offset or array slice function.

like image 686
a paid nerd Avatar asked Dec 14 '22 12:12

a paid nerd


1 Answers

Below is for BigQuery Standard SQL

#standardSQL
SELECT id, path,
  (
    SELECT STRING_AGG(part, '/' ORDER BY index) 
    FROM UNNEST(SPLIT(path, '/')) part WITH OFFSET index 
    WHERE index BETWEEN 1 AND 3
  ) adjusted_path
FROM `project.dataset.table`  

You can test, play with above using sample data from your question as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 id, 'foo/bar/baz/foo1/bar1/baz1/' path UNION ALL
  SELECT 2, 'foo/bar/quux/blat/foo2/bar2/quux2/blat2' 
)
SELECT id, path,
  (
    SELECT STRING_AGG(part, '/' ORDER BY index) 
    FROM UNNEST(SPLIT(path, '/')) part WITH OFFSET index 
    WHERE index BETWEEN 1 AND 3
  ) adjusted_path
FROM `project.dataset.table`   

with result

Row     id      path                                        adjusted_path    
1       1       foo/bar/baz/foo1/bar1/baz1/                 bar/baz/foo1     
2       2       foo/bar/quux/blat/foo2/bar2/quux2/blat2     bar/quux/blat    

If for some reason you want to keep your query "inline/similar" to what you use in PostgreSQL (array_to_string((regexp_split_to_array(path, '/'))[1:3], '/')) - you can introduce SQL UDF (let's name it ARRAY_SLICE) as in below example

#standardSQL
CREATE temp  FUNCTION ARRAY_SLICE(arr ARRAY<STRING>, start INT64, finish INT64) 
RETURNS ARRAY<STRING> AS (
  ARRAY(
    SELECT part FROM UNNEST(arr) part WITH OFFSET index 
    WHERE index BETWEEN start AND finish ORDER BY index
  )
);
SELECT id, path, 
  ARRAY_TO_STRING(ARRAY_SLICE(SPLIT(path, '/'), 1, 3), '/') adjusted_path
FROM `project.dataset.table`  

Obviously, if to apply to same sample data - you will get same result

#standardSQL
CREATE temp  FUNCTION ARRAY_SLICE(arr ARRAY<STRING>, start INT64, finish INT64) 
RETURNS ARRAY<STRING> AS (
  ARRAY(
    SELECT part FROM UNNEST(arr) part WITH OFFSET index 
    WHERE index BETWEEN start AND finish ORDER BY index
  )
);
WITH `project.dataset.table` AS (
  SELECT 1 id, 'foo/bar/baz/foo1/bar1/baz1/' path UNION ALL
  SELECT 2, 'foo/bar/quux/blat/foo2/bar2/quux2/blat2' 
)
SELECT id, path, 
  ARRAY_TO_STRING(ARRAY_SLICE(SPLIT(path, '/'), 1, 3), '/') adjusted_path
FROM `project.dataset.table`   

Row     id      path                                        adjusted_path    
1       1       foo/bar/baz/foo1/bar1/baz1/                 bar/baz/foo1     
2       2       foo/bar/quux/blat/foo2/bar2/quux2/blat2     bar/quux/blat    
like image 166
Mikhail Berlyant Avatar answered Dec 28 '22 06:12

Mikhail Berlyant