Is it possible to use the equivalent of --autodetect
in DataFlow?
i.e. can we load data into a BQ table without specifying a schema, equivalent to how we can load data from a CSV with --autodetect
?
(potentially related question)
If you are using protocol buffers as objects in your PCollections (which should be performing very well on the Dataflow back-end) you might be able to use a util I wrote in the past. It will parse the schema of the protobuffer into a BigQuery schema at runtime, based on inspection of the protobuffer descriptor.
I quickly uploaded it to GitHub, it's WIP, but you might be able to use it or be inspired to write something similar using Java Reflection (I might do it myself at some point).
You can use the util as follows:
TableSchema schema = ProtobufUtils.makeTableSchema(ProtobufClass.getDescriptor());
enhanced_events.apply(BigQueryIO.Write.to(tableToWrite).withSchema(schema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));
where the create disposition will create the table with the schema specified and the ProtobufClass is the class generated using your Protobuf schema and the proto compiler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With