Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use BigQuery Standard SQL in Dataflow?

I would like to run a simple query using BigQuery Standard SQL within dataflow but I can't find where to enable this option. How can I do that?

pipeline.apply(Read.named(metricName + " Read").fromQuery("select * from table1 UNION DISTINCT select * from table2"));

When I try to run it I receive the error:

2016-07-20T13:35:22.543Z: Error:   (6e0ad847af078af9): Workflow failed. Causes: (fe6c7bcb1a35a057): S01:warehouse_handled_returns Read/DataflowPipelineRunner.BatchBigQueryIONativeRead+ParMultiDo(FormatData)+warehouse_handled_returns Write/DataflowPipelineRunner.BatchBigQueryIOWrite/DataflowPipelineRunner.BatchBigQueryIONativeWrite failed., (7f29f1d9435d27bc): BigQuery execution failed., (7f29f1d9435d2823): Error:
Message: Encountered "" at line 23, column 27.

HTTP Code: 400
like image 518
Henrique Martins Avatar asked Jul 20 '16 14:07

Henrique Martins


People also ask

Can you write SQL in Dataflow?

To create a Dataflow SQL job, you must write and run a Dataflow SQL query. Note: To use Dataflow SQL, you might need to enable the Data Catalog API in the Google Cloud project that you're using to write and run queries.

Is BigQuery Standard SQL?

BigQuery supports the Google Standard SQL dialect, but a legacy SQL dialect is also available. If you are new to BigQuery, you should use Google Standard SQL as it supports the broadest range of functionality. For example, features such as DDL and DML statements are only supported using Google Standard SQL.

What is Dataflow in SQL?

The Data Flow task encapsulates the data flow engine that moves data between sources and destinations, and lets the user transform, clean, and modify data as it is moved. Addition of a Data Flow task to a package control flow makes it possible for the package to extract, transform, and load data.

Does BigQuery support ANSI SQL?

Google BigQuery supports ANSI SQL and has all the supported functions available like analytical, window, aggregation, and many more. SQL Server also supports ANSI SQL and has all the features of SQL available to the users to perform analytics over data.


2 Answers

You can now use standard SQL with Dataflow.

https://cloud.google.com/dataflow/model/bigquery-io

PCollection<TableRow> weatherData = p.apply(
BigQueryIO.Read
.named("ReadYearAndTemp")
.fromQuery("SELECT year, mean_temp FROM `samples.weather_stations`")
.usingStandardSql();
like image 125
Graham Polley Avatar answered Nov 10 '22 14:11

Graham Polley


Until DataFlow formally supports BigQuery Standard SQL, one workaround is to start query with the following comment:

#StandardSQL

This will instruct BigQuery to use Standard SQL instead of Legacy SQL

like image 39
Mosha Pasumansky Avatar answered Nov 10 '22 15:11

Mosha Pasumansky