I would like to run a simple query using BigQuery Standard SQL within dataflow but I can't find where to enable this option. How can I do that?
pipeline.apply(Read.named(metricName + " Read").fromQuery("select * from table1 UNION DISTINCT select * from table2"));
When I try to run it I receive the error:
2016-07-20T13:35:22.543Z: Error: (6e0ad847af078af9): Workflow failed. Causes: (fe6c7bcb1a35a057): S01:warehouse_handled_returns Read/DataflowPipelineRunner.BatchBigQueryIONativeRead+ParMultiDo(FormatData)+warehouse_handled_returns Write/DataflowPipelineRunner.BatchBigQueryIOWrite/DataflowPipelineRunner.BatchBigQueryIONativeWrite failed., (7f29f1d9435d27bc): BigQuery execution failed., (7f29f1d9435d2823): Error:
Message: Encountered "" at line 23, column 27.
HTTP Code: 400
To create a Dataflow SQL job, you must write and run a Dataflow SQL query. Note: To use Dataflow SQL, you might need to enable the Data Catalog API in the Google Cloud project that you're using to write and run queries.
BigQuery supports the Google Standard SQL dialect, but a legacy SQL dialect is also available. If you are new to BigQuery, you should use Google Standard SQL as it supports the broadest range of functionality. For example, features such as DDL and DML statements are only supported using Google Standard SQL.
The Data Flow task encapsulates the data flow engine that moves data between sources and destinations, and lets the user transform, clean, and modify data as it is moved. Addition of a Data Flow task to a package control flow makes it possible for the package to extract, transform, and load data.
Google BigQuery supports ANSI SQL and has all the supported functions available like analytical, window, aggregation, and many more. SQL Server also supports ANSI SQL and has all the features of SQL available to the users to perform analytics over data.
You can now use standard SQL with Dataflow.
https://cloud.google.com/dataflow/model/bigquery-io
PCollection<TableRow> weatherData = p.apply(
BigQueryIO.Read
.named("ReadYearAndTemp")
.fromQuery("SELECT year, mean_temp FROM `samples.weather_stations`")
.usingStandardSql();
Until DataFlow formally supports BigQuery Standard SQL, one workaround is to start query with the following comment:
#StandardSQL
This will instruct BigQuery to use Standard SQL instead of Legacy SQL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With