BigQuery can read from Google Drive as a federated source. See here. I want to be able to read a table in BigQuery into my Dataflow pipeline that is pointing to a Drive document.
Hooking up BigQuery to the file in Drive works perfectly fine:
But, when I then try to read that table into my Dataflow pipeline I (understandably) get the following error:
No suitable credentials found to access Google Drive. Contact the table owner for assistance.
[..]
PCollection<TableRow> results = pipeline.apply("whatever",
BigQueryIO.Read.fromQuery("SELECT * from [CPT_7414_PLAYGROUND.google_drive_test]"))
.apply(ParDo.of(new DoFn<TableRow, TableRow>() {
[..]
How do I permission Dataflow to be able to read from a table in BigQuery that is pointing to Drive?
You can use the template as a quick solution to move Pub/Sub data to BigQuery. The template reads JSON-formatted messages from Pub/Sub and converts them to BigQuery elements. Requirements for this pipeline: The data field of Pub/Sub messages must use the JSON format, described in this JSON guide.
Dataflow SQL lets you use your SQL skills to develop streaming Dataflow pipelines right from the BigQuery web UI. You can join streaming data from Pub/Sub with files in Cloud Storage or tables in BigQuery, write results into BigQuery, and build real-time dashboards using Google Sheets or other BI tools.
Dataflow does not currently support reading from a federated table backed by Drive, but this is coming soon.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With