java.sql.SQLException: No suitable driver found when loading DataFrame into Spark SQL

Question

I'm hitting very strange problem when trying to load JDBC DataFrame into Spark SQL.

I've tried several Spark clusters - YARN, standalone cluster and pseudo distributed mode on my laptop. It's reproducible on both Spark 1.3.0 and 1.3.1. The problem occurs in both spark-shell and when executing the code with spark-submit. I've tried MySQL & MS SQL JDBC drivers without success.

Consider following sample:

val driver = "com.mysql.jdbc.Driver"
val url = "jdbc:mysql://localhost:3306/test"

val t1 = {
  sqlContext.load("jdbc", Map(
    "url" -> url,
    "driver" -> driver,
    "dbtable" -> "t1",
    "partitionColumn" -> "id",
    "lowerBound" -> "0",
    "upperBound" -> "100",
    "numPartitions" -> "50"
  ))
}

So far so good, the schema gets resolved properly:

t1: org.apache.spark.sql.DataFrame = [id: int, name: string]

But when I evaluate DataFrame:

t1.take(1)

Following exception occurs:

15/04/29 01:56:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.1.42): java.sql.SQLException: No suitable driver found for jdbc:mysql://<hostname>:3306/test
    at java.sql.DriverManager.getConnection(DriverManager.java:689)
    at java.sql.DriverManager.getConnection(DriverManager.java:270)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:158)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:150)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:317)
    at org.apache.spark.sql.jdbc.JDBCRDD.compute(JDBCRDD.scala:309)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

When I try to open JDBC connection on executor:

import java.sql.DriverManager

sc.parallelize(0 until 2, 2).map { i =>
  Class.forName(driver)
  val conn = DriverManager.getConnection(url)
  conn.close()
  i
}.collect()

it works perfectly:

res1: Array[Int] = Array(0, 1)

When I run the same code on local Spark, it works perfectly too:

scala> t1.take(1)
...
res0: Array[org.apache.spark.sql.Row] = Array([1,one])

I'm using Spark pre-built with Hadoop 2.4 support.

The easiest way to reproduce the problem is to start Spark in pseudo distributed mode with start-all.sh script and run following command:

/path/to/spark-shell --master spark://<hostname>:7077 --jars /path/to/mysql-connector-java-5.1.35.jar --driver-class-path /path/to/mysql-connector-java-5.1.35.jar

Is there a way to work this around? It looks like a severe problem, so it's strange that googling doesn't help here.

user3466407 · Accepted Answer

I am using spark-1.6.1 with SQL server, still faced the same issue. I had to add the library(sqljdbc-4.0.jar) to the lib in the instance and below line in conf/spark-dfault.conf file.

spark.driver.extraClassPath lib/sqljdbc-4.0.jar

Wildfire · Answer

Apparently this issue has been recently reported:

https://issues.apache.org/jira/browse/SPARK-6913

The problem is in java.sql.DriverManager that doesn't see the drivers loaded by ClassLoaders other than bootstrap ClassLoader.

As a temporary workaround it's possible to add required drivers to boot classpath of executors.

UPDATE: This pull request fixes the problem: https://github.com/apache/spark/pull/5782

UPDATE 2: The fix merged to Spark 1.4

Harish Pathak · Answer

For writing data to MySQL

In spark 1.4.0, you have to load MySQL before writing into it because it loads drivers on load function but not on write function. We have to put jar on every worker node and set the path in spark-defaults.conf file on each node. This issue has been fixed in spark 1.5.0

https://issues.apache.org/jira/browse/SPARK-10036

java.sql.SQLException: No suitable driver found when loading DataFrame into Spark SQL

Tags:

scala

jdbc

apache-spark

apache-spark-sql

Wildfire

3 Answers

user3466407

Wildfire

Harish Pathak

Recent Activity

Donate For Us

java.sql.SQLException: No suitable driver found when loading DataFrame into Spark SQL

Tags:

scala

jdbc

apache-spark

apache-spark-sql

Wildfire

3 Answers

user3466407

Wildfire

Harish Pathak

Related questions

Recent Activity

Donate For Us