Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery select * except nested column

What I want to do should be simple: Given a BigQuery schema, I want to select all tables (including nested ones) apart from a few. The tricky thing is that BigQuery has a nested structure and the few I want to exclude are nested within other records.

I've found the SELECT * except clause in the BigQuery documentation which seems very promising. The problem is that it doesn't seem to support the nested structure exclusion.

For example, using the public github_nested dataset we can write a query like

#standardSQL
SELECT * except (payload) FROM `bigquery-public-data.samples.github_nested` LIMIT 1000

This does what we expect successfully by removing the payload record from the results. Let's imagine now that we only want to remove payload.comment, thereby preserving the rest of the payload record contents in the response. I tried

#standardSQL
SELECT * except (payload.comment) FROM `bigquery-public-data.samples.github_nested` LIMIT 1000

However, this fails.

Anyone know of a way to accomplish this?

Thanks!

like image 877
Spikey Avatar asked Dec 07 '16 14:12

Spikey


People also ask

How do you exclude columns in BigQuery?

A SELECT * EXCEPT statement specifies the names of one or more columns to exclude from the result. All matching column names are omitted from the output. Note: SELECT * EXCEPT does not exclude columns that do not have names.

How do I select multiple columns in BigQuery?

The syntax for select statement is SELECT followed by the column's name where you want to pull the data from and then from the table name. To pull the data from multiple columns, you will have to mention the column names separated by a comma in the SELECT statement and then from the table name.

What is nested data in BigQuery?

BigQuery Nested Fields are fields linked together like a single entity, just like an object or a struct. Consider the following table: Image Source. The “title” field in the above table is a good example of a BigQuery Nested Field.

What is repeated fields in BigQuery?

A repeated field can be accessed as an ARRAY type in Google Standard SQL. A RECORD column can have REPEATED mode, which is represented as an array of STRUCT types. Also, a field within a record can be repeated, which is represented as a STRUCT that contains an ARRAY . An array cannot contain another array directly.


1 Answers

The way to think of the problem is that you still want a payload column in the result, but you want it to have a different structure, namely to exclude comment. In this case, you can use SELECT * REPLACE to make the modification. For example,

#standardSQL
SELECT * REPLACE ((SELECT AS STRUCT payload.* EXCEPT (comment)) AS payload)
FROM `bigquery-public-data.samples.github_nested`
LIMIT 1000;
like image 147
Elliott Brossard Avatar answered Sep 23 '22 19:09

Elliott Brossard