Dataproc + BigQuery examples - any available?

Question

According to the Dataproc docos, it has "native and automatic integrations with BigQuery".

I have a table in BigQuery. I want to read that table and perform some analysis on it using the Dataproc cluster that I've created (using a PySpark job). Then write the results of this analysis back to BigQuery. You may be asking "why not just do the analysis in BigQuery directly!?" - the reason is because we are creating complex statistical models, and SQL is too high level for developing them. We need something like Python or R, ergo Dataproc.

Are they any Dataproc + BigQuery examples available? I can't find any.

lukeforehand · Accepted Answer

The above example doesn't show how to write data to an output table. You need to do this:

.saveAsNewAPIHadoopFile(
hadoopConf.get(BigQueryConfiguration.TEMP_GCS_PATH_KEY), 
classOf[String], 
classOf[JsonObject], 
classOf[BigQueryOutputFormat[String, JsonObject]], hadoopConf)

where the key: String is actually ignored

Dataproc + BigQuery examples - any available?

Tags:

google-cloud-platform

google-bigquery

google-cloud-dataproc

Graham Polley

1 Answers

lukeforehand

Recent Activity

Donate For Us

Dataproc + BigQuery examples - any available?

Tags:

google-cloud-platform

google-bigquery

google-cloud-dataproc

Graham Polley

1 Answers

lukeforehand

Related questions

Recent Activity

Donate For Us