I’m having an issue connecting Spark SQL to a PostgreSQL data source. I’ve downloaded the Postgres JDBC jar and included it in an uber jar using sbt-assembly.
My (failing) source code: https://gist.github.com/geowa4/a9bc238ca7c372b95267.
I’ve also tried using sqlContext.jdbc()
preceded with classOf[org.postgresql.Driver]
as well. It appears the driver can access the Driver just fine.
Any help would be much appreciated. Thanks.
SimpleApp.scala:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val commits = sqlContext.load("jdbc", Map(
"url" -> "jdbc:postgresql://192.168.59.103:5432/postgres",
"dbtable" -> "commits",
"driver" -> "org.postgresql.Driver"))
commits.select("message").show(1)
}
}
simple.sbt:
name := "simple-project"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.1" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.3.1" % "provided"
libraryDependencies += "org.postgresql" % "postgresql" % "9.4-1201-jdbc41"
output (Edited):
Exception in thread "main" java.lang.ClassNotFoundException: org.postgresql.Driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:102)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
at SimpleApp$.main(SimpleApp.scala:17)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
EDIT: I changed the Scala version to 2.10.5 and the output changed to this. I feel like I'm making progress.
Start a Spark Shell and Connect to PostgreSQL Data To connect to PostgreSQL, set the Server, Port (the default port is 5432), and Database connection properties and set the User and Password you wish to use to authenticate to the server.
It provides a standard set of interfaces to SQL -compliant databases. PostgreSQL provides a type 4 JDBC driver. Type 4 indicates that the driver is written in Pure Java, and communicates in the database system's own network protocol.
There is a problem with general problem with JDBC, where the primordial classloader must know about the jar. In Spark 1.3 this can be addressed using the SPARK_CLASSPATH
option as described here:
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#jdbc-to-other-databases
In Spark 1.4, this should be fixed by #5782.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With