Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery select * except two columns

I would like to select everything from the public BigQuery github_repos dataset except from two records: author.name AND difference.old_mode. Based on a similar question I asked, I think I want to run a query similar to

#standardSQL
SELECT * REPLACE ((SELECT AS STRUCT author.* EXCEPT (name)) AS author), 
REPLACE ((SELECT AS STRUCT difference.* EXCEPT (old_mode)) AS difference)
FROM `bigquery-public-data.github_repos.commits`
LIMIT 1000;

If I run the author exclusion, it works well:

#standardSQL
SELECT * REPLACE ((SELECT AS STRUCT author.* EXCEPT (name)) AS author)
FROM `bigquery-public-data.github_repos.commits`
LIMIT 1000;

However, the difference exclusion has an error:

#standardSQL
SELECT * REPLACE ((SELECT AS STRUCT difference.* EXCEPT (old_mode)) AS difference)
FROM `bigquery-public-data.github_repos.commits`
LIMIT 1000;

Error:

Dot-star is not supported for type ARRAY<STRUCT<old_mode INT64, new_mode INT64, old_path STRING, ...>> at [2:41]

Thank you.

Update Not a duplicate of SQL server question.

like image 274
Spikey Avatar asked Dec 07 '16 15:12

Spikey


People also ask

How do you exclude columns in BigQuery?

A SELECT * EXCEPT statement specifies the names of one or more columns to exclude from the result. All matching column names are omitted from the output. Note: SELECT * EXCEPT does not exclude columns that do not have names.

Is there any way in BigQuery to select all columns except a few?

Use EXCEPT in a SELECT * to select all fields from a table except the columns you don't want. In the example below, I created a CTE named orders and selected all of the columns except for item_id and item_name.

How do I select two columns in BigQuery?

The syntax for select statement is SELECT followed by the column's name where you want to pull the data from and then from the table name. To pull the data from multiple columns, you will have to mention the column names separated by a comma in the SELECT statement and then from the table name.


1 Answers

As a self-contained example, consider this query:

WITH T AS (
  SELECT 10 AS a, ARRAY<STRUCT<x INT64, y STRING, z BOOL>>[
      (1, 'foo', true)] AS arr UNION ALL
  SELECT 11, ARRAY<STRUCT<x INT64, y STRING, z BOOL>>[
      (2, 'bar', false), (3, 'baz', true)]
)
SELECT * FROM T;

It returns a column a of type INT64 and a column arr of type ARRAY<STRUCT<x INT64, y STRING, z BOOL>>. If you wanted to return a modification of arr where the struct inside the array omits y, you could use a combination of SELECT * REPLACE and SELECT * EXCEPT:

WITH T AS (
  SELECT 10 AS a, ARRAY<STRUCT<x INT64, y STRING, z BOOL>>[
      (1, 'foo', true)] AS arr UNION ALL
  SELECT 11, ARRAY<STRUCT<x INT64, y STRING, z BOOL>>[
      (2, 'bar', false), (3, 'baz', true)]
)
SELECT * REPLACE(ARRAY(SELECT AS STRUCT * EXCEPT (y) FROM UNNEST(arr)) AS arr)
FROM T;

The idea is to replace the original array with a new one, and we use an ARRAY subquery with SELECT AS STRUCT and * EXCEPT to reconstruct the array with struct elements whose fields don't include y.

Going back to the query in the question, you can apply the same idea to difference and old_mode:

SELECT * REPLACE (
  ARRAY(SELECT AS STRUCT * EXCEPT (old_mode) FROM UNNEST(difference)) AS difference
)
FROM `bigquery-public-data.github_repos.commits`
LIMIT 1000;

The query result contains a difference array whose struct doesn't include the old_mode field.

like image 85
Elliott Brossard Avatar answered Oct 01 '22 17:10

Elliott Brossard