Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

connecting mysql with pyspark

I want to connect mysql with pyspark. I am using jupyter notebook to run pyspark. However when I do this,

dataframe_mysql = sqlContext.read.format("jdbc").options(
    url="jdbc:mysql://localhost:3306/playground",
    driver = "com.mysql.jdbc.Driver",
    dbtable = "play1",
    user="root",
    password="sp123").load()

I get an error as

Py4JJavaError: An error occurred while calling o89.load. : java.lang.ClassNotFoundException: com.mysql.jdbc.Driver.

How can I resolve this error and load mysql data in pyspark dataframe?

like image 713
Anand Nautiyal Avatar asked Aug 21 '18 07:08

Anand Nautiyal


2 Answers

I use python script :

spark = SparkSession \
        .builder \
        .appName('test') \
        .master('local[*]') \
        .config("spark.driver.extraClassPath", "<path to mysql-connector-java-5.1.49-bin.jar>") \
        .getOrCreate()

df = spark.read.format("jdbc").option("url","jdbc:mysql://localhost/<database_name>").option("driver","com.mysql.jdbc.Driver").option("dbtable","<table_name>").option("user","<user>").option("password","<password>").load()

replace any in <> with your parameters.

like image 158
Nontapat Sumalnop Avatar answered Sep 19 '22 11:09

Nontapat Sumalnop


Using notebook launched by pyspark

Install MySQL Java connector driver by Maven/Gradle or download jar file directly. Then provide jar path to pyspark as --jars argument. If you choosed maven approach it should be like this for mysql connector version 8.0.11 :

pyspark --jars "${HOME}/.m2/repository/mysql/mysql-connector-java/8.0.11/mysql-connector-java-8.0.11.jar"

Using findspark

Using add-packages provide mysql driver, like:

import findspark

findspark.add_packages('mysql:mysql-connector-java:8.0.11')
like image 39
reith Avatar answered Sep 19 '22 11:09

reith