Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dataproc + BigQuery examples - any available?

According to the Dataproc docos, it has "native and automatic integrations with BigQuery".

I have a table in BigQuery. I want to read that table and perform some analysis on it using the Dataproc cluster that I've created (using a PySpark job). Then write the results of this analysis back to BigQuery. You may be asking "why not just do the analysis in BigQuery directly!?" - the reason is because we are creating complex statistical models, and SQL is too high level for developing them. We need something like Python or R, ergo Dataproc.

Are they any Dataproc + BigQuery examples available? I can't find any.

like image 360
Graham Polley Avatar asked Oct 06 '15 02:10

Graham Polley


1 Answers


The above example doesn't show how to write data to an output table. You need to do this:

.saveAsNewAPIHadoopFile(
hadoopConf.get(BigQueryConfiguration.TEMP_GCS_PATH_KEY), 
classOf[String], 
classOf[JsonObject], 
classOf[BigQueryOutputFormat[String, JsonObject]], hadoopConf)

where the key: String is actually ignored

like image 127
lukeforehand Avatar answered Sep 28 '22 02:09

lukeforehand