I wanted to take advantage of the new BigQuery functionality of time partitioned tables, but am unsure this is currently possible in the 1.6 version of the Dataflow SDK.
Looking at the BigQuery JSON API, to create a day partitioned table one needs to pass in a
"timePartitioning": { "type": "DAY" }
option, but the com.google.cloud.dataflow.sdk.io.BigQueryIO interface only allows specifying a TableReference.
I thought that maybe I could pre-create the table, and sneak in a partition decorator via a BigQueryIO.Write.toTableReference lambda..? Is anyone else having success with creating/writing partitioned tables via Dataflow?
This seems like a similar issue to setting the table expiration time which isn't currently available either.
You can partition BigQuery tables by: Time-unit column: Tables are partitioned based on a TIMESTAMP , DATE , or DATETIME column in the table. Ingestion time: Tables are partitioned based on the timestamp when BigQuery ingests the data. Integer range: Tables are partitioned based on an integer column.
Query Specific Partitions when you create a table partitioned by according to a TIMESTAMP or DATE column. Tables partitioned according to a TIMESTAMP or DATE column do not have pseudo-columns! To limit the number of partitions analyzed when querying partitioned tables, you can use a predicate filter (WHERE clause).
Like clustering, partitioning uses user-defined partition columns to specify how data is partitioned and what data is stored in each partition. Unlike clustering, partitioning provides granular query cost estimates before you run a query.
As Pavan says, it is definitely possible to write to partition tables with Dataflow. Are you using the DataflowPipelineRunner
operating in streaming mode or batch mode?
The solution you proposed should work. Specifically, if you pre-create a table with date partitioning set up, then you can use a BigQueryIO.Write.toTableReference
lambda to write to a date partition. For example:
/**
* A Joda-time formatter that prints a date in format like {@code "20160101"}.
* Threadsafe.
*/
private static final DateTimeFormatter FORMATTER =
DateTimeFormat.forPattern("yyyyMMdd").withZone(DateTimeZone.UTC);
// This code generates a valid BigQuery partition name:
Instant instant = Instant.now(); // any Joda instant in a reasonable time range
String baseTableName = "project:dataset.table"; // a valid BigQuery table name
String partitionName =
String.format("%s$%s", baseTableName, FORMATTER.print(instant));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With