Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Unable to find JDBC Driver

So I've been using sbt with assembly to package all my dependencies into a single jar for my spark jobs. I've got several jobs where I was using c3p0 to setup connection pool information, broadcast that out, and then use foreachPartition on the RDD to then grab a connection, and insert the data into the database. In my sbt build script, I include

"mysql" % "mysql-connector-java" % "5.1.33" 

This makes sure the JDBC connector is packaged up with the job. Everything works great.

So recently I started playing around with SparkSQL and realized it's much easier to simply take a dataframe and save it to a jdbc source with the new features in 1.3.0

I'm getting the following exception :

java.sql.SQLException: No suitable driver found for jdbc:mysql://some.domain.com/myschema?user=user&password=password at java.sql.DriverManager.getConnection(DriverManager.java:596) at java.sql.DriverManager.getConnection(DriverManager.java:233)

When I was running this locally I got around it by setting

SPARK_CLASSPATH=/path/where/mysql-connector-is.jar 

Ultimately what I'm wanting to know is, why is the job not capable of finding the driver when it should be packaged up with it? My other jobs never had this problem. From what I can tell both c3p0 and the dataframe code both make use of the java.sql.DriverManager (which handles importing everything for you from what I can tell) so it should work just fine?? If there is something that prevents the assembly method from working, what do I need to do to make this work?

like image 790
Adam Ritter Avatar asked Apr 10 '15 03:04

Adam Ritter


People also ask

Where do I put JDBC driver for Spark?

Installing Simba Spark JDBC Driver Unzip and put the driver JAR files into the target folder of the SQLLine installation folder, where the source build placed its JAR files.

How does Spark connect to database?

To connect any database connection we require basically the common properties such as database driver , db url , username and password. Hence in order to connect using pyspark code also requires the same set of properties. url — the JDBC url to connect the database.

What is Spark JDBC?

Spark SQL also includes a data source that can read data from other databases using JDBC. This functionality should be preferred over using JdbcRDD. This is because the results are returned as a DataFrame and they can easily be processed in Spark SQL or joined with other data sources.


2 Answers

This person was having similar issue: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-DataFrame-with-MySQL-td22178.html

Have you updated your connector drivers to the most recent version? Also did you specify the driver class when you called load()?

Map<String, String> options = new HashMap<String, String>(); options.put("url", "jdbc:mysql://localhost:3306/video_rcmd?user=root&password=123456"); options.put("dbtable", "video"); options.put("driver", "com.mysql.cj.jdbc.Driver"); //here DataFrame jdbcDF = sqlContext.load("jdbc", options);  

In spark/conf/spark-defaults.conf, you can also set spark.driver.extraClassPath and spark.executor.extraClassPath to the path of your MySql driver .jar

like image 78
insomniak Avatar answered Sep 23 '22 21:09

insomniak


These options are clearly mentioned in spark docs: --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar

The mistake I was doing was mentioning these options after my application's jar.

However the correct way is to specify these options immediately after spark-submit:

spark-submit --driver-class-path /somepath/project/mysql-connector-java-5.1.30-bin.jar --jars /somepath/project/mysql-connector-java-5.1.30-bin.jar --class com.package.MyClass target/scala-2.11/project_2.11-1.0.jar

like image 26
Ayush Vatsyayan Avatar answered Sep 26 '22 21:09

Ayush Vatsyayan