Creating/Writing to Parititoned BigQuery table via Google Cloud Dataflow

Tags:

I wanted to take advantage of the new BigQuery functionality of time partitioned tables, but am unsure this is currently possible in the 1.6 version of the Dataflow SDK.

Looking at the BigQuery JSON API, to create a day partitioned table one needs to pass in a

"timePartitioning": { "type": "DAY" }

option, but the com.google.cloud.dataflow.sdk.io.BigQueryIO interface only allows specifying a TableReference.

I thought that maybe I could pre-create the table, and sneak in a partition decorator via a BigQueryIO.Write.toTableReference lambda..? Is anyone else having success with creating/writing partitioned tables via Dataflow?

This seems like a similar issue to setting the table expiration time which isn't currently available either.

883

asked Jun 30 '16 05:06

ptf

1 Answers

As Pavan says, it is definitely possible to write to partition tables with Dataflow. Are you using the DataflowPipelineRunner operating in streaming mode or batch mode?

The solution you proposed should work. Specifically, if you pre-create a table with date partitioning set up, then you can use a BigQueryIO.Write.toTableReference lambda to write to a date partition. For example:

/**
 * A Joda-time formatter that prints a date in format like {@code "20160101"}.
 * Threadsafe.
 */
private static final DateTimeFormatter FORMATTER =
    DateTimeFormat.forPattern("yyyyMMdd").withZone(DateTimeZone.UTC);

// This code generates a valid BigQuery partition name:
Instant instant = Instant.now(); // any Joda instant in a reasonable time range
String baseTableName = "project:dataset.table"; // a valid BigQuery table name
String partitionName =
    String.format("%s$%s", baseTableName, FORMATTER.print(instant));

answered Sep 28 '22 06:09

Dan Halperin

Related questions
                            
                                JDBC driver for Google BigQuery?
                            
                                Bigquery stream: 'Failed to insert XX rows due to timeout'
                            
                                BigQuery - Flexible Schema in Record Field
                            
                                Validating rows before inserting into BigQuery from Dataflow
                            
                                BigQuery: Group by table name
                            
                                After recreating BigQuery table streaming inserts are not working?
                            
                                logging all BigQuery queries
                            
                                "Encountered an error while globbing file pattern" error when using BigQuery API w/ Google Sheets
                            
                                BigQuery User Defined Aggregation Function?
                            
                                Can a field have mode NULLABLE and REPEATED in BigQuery?
                            
                                BigQuery - No matching signature for operator = for argument types: INT64, STRING
                            
                                BigQuery error when loading csv file from Google Cloud Storage
                            
                                How do I request paginated BigQuery query results using pageTokens with the Google Client lib for Java?
                            
                                How to convert timestamp to seconds in BigQuery Standard SQL
                            
                                Cast Integer type to float in BigQuery Standard Sql
                            
                                bigquery url decode
                            
                                BigQuery creat repeated record field from query
                            
                                How to sort an array in BigQuery standard SQL?
                            
                                how to stop running bigquery query
                            
                                How do you compare two arrays in BigQuery?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating/Writing to Parititoned BigQuery table via Google Cloud Dataflow

Tags:

google-bigquery

google-cloud-dataflow

apache-beam-io

ptf

People also ask

1 Answers

Dan Halperin

Recent Activity

Donate For Us