Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark - Adding JDBC Driver JAR to Google Dataproc

I am trying to write via JDBC:

df.write.jdbc("jdbc:postgresql://123.123.123.123:5432/myDatabase", "myTable", props)

The Spark docs explain that the configuration option spark.driver.extraClassPath cannot be used to add JDBC Driver JARs if running in client mode (which is the mode Dataproc runs in) since the JVM has already been started.

I tried adding the JAR path in Dataproc's submit command:

gcloud beta dataproc jobs submit spark ... 
     --jars file:///home/bryan/org.postgresql.postgresql-9.4-1203-jdbc41.jar

I also added the command to load the driver:

  Class.forName("org.postgresql.Driver")

But I still get the error:

java.sql.SQLException: No suitable driver found for jdbc:postgresql://123.123.123.123:5432/myDatabase 
like image 380
BAR Avatar asked Oct 05 '15 21:10

BAR


People also ask

Is Google Dataproc built on Spark?

Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning.


2 Answers

From my experience adding driver to the properties usually solves the problem:

props.put("driver", "org.postgresql.Driver")
db.write.jdbc(url, table, props)
like image 93
zero323 Avatar answered Sep 21 '22 02:09

zero323


You may want to try adding --driver-class-path to the very end of your command arguments:

gcloud beta dataproc jobs submit spark ... 
    --jars file:///home/bryan/org.postgresql.postgresql-9.4-1203-jdbc41.jar \
    --driver-class-path /home/bryan/org.postgresql.postgresql-9.4-1203-jdbc41.jar

Another approach if you're staging the jarfile onto the cluster before the job anyway is to dump the jarfile you need into /usr/lib/hadoop/lib/ where it should automatically be part of the driver classpath for both Hadoop and Spark jobs.

like image 20
Dennis Huo Avatar answered Sep 21 '22 02:09

Dennis Huo