I want to connect mysql with pyspark. I am using jupyter notebook to run pyspark. However when I do this,
dataframe_mysql = sqlContext.read.format("jdbc").options(
url="jdbc:mysql://localhost:3306/playground",
driver = "com.mysql.jdbc.Driver",
dbtable = "play1",
user="root",
password="sp123").load()
I get an error as
Py4JJavaError: An error occurred while calling o89.load. : java.lang.ClassNotFoundException: com.mysql.jdbc.Driver.
How can I resolve this error and load mysql data in pyspark dataframe?
I use python script :
spark = SparkSession \
.builder \
.appName('test') \
.master('local[*]') \
.config("spark.driver.extraClassPath", "<path to mysql-connector-java-5.1.49-bin.jar>") \
.getOrCreate()
df = spark.read.format("jdbc").option("url","jdbc:mysql://localhost/<database_name>").option("driver","com.mysql.jdbc.Driver").option("dbtable","<table_name>").option("user","<user>").option("password","<password>").load()
replace any in <> with your parameters.
pyspark
Install MySQL Java connector driver by Maven/Gradle or download jar file directly. Then provide jar path to pyspark as --jars
argument. If you choosed maven approach it should be like this for mysql connector version 8.0.11 :
pyspark --jars "${HOME}/.m2/repository/mysql/mysql-connector-java/8.0.11/mysql-connector-java-8.0.11.jar"
findspark
Using add-packages provide mysql driver, like:
import findspark
findspark.add_packages('mysql:mysql-connector-java:8.0.11')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With