In BigQuery, I have a table with a path
column like this:
ID . | Path
---------+----------------------------------------
1 | foo/bar/baz
2 | foo/bar/quux/blat
I would like to be able to split the path on forward slash (/
) and select one or more path parts, rejoining them.
In PostgreSQL, this is easy:
select array_to_string((regexp_split_to_array(path, '/'))[1:3], '/')
But BigQuery doesn't seem to have any kind of range offset or array slice function.
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, path,
(
SELECT STRING_AGG(part, '/' ORDER BY index)
FROM UNNEST(SPLIT(path, '/')) part WITH OFFSET index
WHERE index BETWEEN 1 AND 3
) adjusted_path
FROM `project.dataset.table`
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 'foo/bar/baz/foo1/bar1/baz1/' path UNION ALL
SELECT 2, 'foo/bar/quux/blat/foo2/bar2/quux2/blat2'
)
SELECT id, path,
(
SELECT STRING_AGG(part, '/' ORDER BY index)
FROM UNNEST(SPLIT(path, '/')) part WITH OFFSET index
WHERE index BETWEEN 1 AND 3
) adjusted_path
FROM `project.dataset.table`
with result
Row id path adjusted_path
1 1 foo/bar/baz/foo1/bar1/baz1/ bar/baz/foo1
2 2 foo/bar/quux/blat/foo2/bar2/quux2/blat2 bar/quux/blat
If for some reason you want to keep your query "inline/similar" to what you use in PostgreSQL (array_to_string((regexp_split_to_array(path, '/'))[1:3], '/')) - you can introduce SQL UDF (let's name it ARRAY_SLICE
) as in below example
#standardSQL
CREATE temp FUNCTION ARRAY_SLICE(arr ARRAY<STRING>, start INT64, finish INT64)
RETURNS ARRAY<STRING> AS (
ARRAY(
SELECT part FROM UNNEST(arr) part WITH OFFSET index
WHERE index BETWEEN start AND finish ORDER BY index
)
);
SELECT id, path,
ARRAY_TO_STRING(ARRAY_SLICE(SPLIT(path, '/'), 1, 3), '/') adjusted_path
FROM `project.dataset.table`
Obviously, if to apply to same sample data - you will get same result
#standardSQL
CREATE temp FUNCTION ARRAY_SLICE(arr ARRAY<STRING>, start INT64, finish INT64)
RETURNS ARRAY<STRING> AS (
ARRAY(
SELECT part FROM UNNEST(arr) part WITH OFFSET index
WHERE index BETWEEN start AND finish ORDER BY index
)
);
WITH `project.dataset.table` AS (
SELECT 1 id, 'foo/bar/baz/foo1/bar1/baz1/' path UNION ALL
SELECT 2, 'foo/bar/quux/blat/foo2/bar2/quux2/blat2'
)
SELECT id, path,
ARRAY_TO_STRING(ARRAY_SLICE(SPLIT(path, '/'), 1, 3), '/') adjusted_path
FROM `project.dataset.table`
Row id path adjusted_path
1 1 foo/bar/baz/foo1/bar1/baz1/ bar/baz/foo1
2 2 foo/bar/quux/blat/foo2/bar2/quux2/blat2 bar/quux/blat
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With