What I want to do should be simple: Given a BigQuery schema, I want to select all tables (including nested ones) apart from a few. The tricky thing is that BigQuery has a nested structure and the few I want to exclude are nested within other records.
I've found the SELECT * except clause in the BigQuery documentation which seems very promising. The problem is that it doesn't seem to support the nested structure exclusion.
For example, using the public github_nested dataset we can write a query like
#standardSQL
SELECT * except (payload) FROM `bigquery-public-data.samples.github_nested` LIMIT 1000
This does what we expect successfully by removing the payload record from the results. Let's imagine now that we only want to remove payload.comment, thereby preserving the rest of the payload record contents in the response. I tried
#standardSQL
SELECT * except (payload.comment) FROM `bigquery-public-data.samples.github_nested` LIMIT 1000
However, this fails.
Anyone know of a way to accomplish this?
Thanks!
A SELECT * EXCEPT statement specifies the names of one or more columns to exclude from the result. All matching column names are omitted from the output. Note: SELECT * EXCEPT does not exclude columns that do not have names.
The syntax for select statement is SELECT followed by the column's name where you want to pull the data from and then from the table name. To pull the data from multiple columns, you will have to mention the column names separated by a comma in the SELECT statement and then from the table name.
BigQuery Nested Fields are fields linked together like a single entity, just like an object or a struct. Consider the following table: Image Source. The “title” field in the above table is a good example of a BigQuery Nested Field.
A repeated field can be accessed as an ARRAY type in Google Standard SQL. A RECORD column can have REPEATED mode, which is represented as an array of STRUCT types. Also, a field within a record can be repeated, which is represented as a STRUCT that contains an ARRAY . An array cannot contain another array directly.
The way to think of the problem is that you still want a payload
column in the result, but you want it to have a different structure, namely to exclude comment
. In this case, you can use SELECT * REPLACE
to make the modification. For example,
#standardSQL
SELECT * REPLACE ((SELECT AS STRUCT payload.* EXCEPT (comment)) AS payload)
FROM `bigquery-public-data.samples.github_nested`
LIMIT 1000;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With