I want to run my existing application with Apache Spark and MySQL.
The idea is simple: Spark can read MySQL data via JDBC and can also execute SQL queries, so we can connect it directly to MySQL and run the queries. Why is this faster? For long-running (i.e., reporting or BI) queries, it can be much faster as Spark is a massively parallel system.
In spark, we can pass read format as “jdbc” with database url, username and password to read same table. We can notice that we are getting the same number of row as count when we have data frame based on employees table.
Spark SQL also includes a data source that can read data from other databases using JDBC. This functionality should be preferred over using JdbcRDD. This is because the results are returned as a DataFrame and they can easily be processed in Spark SQL or joined with other data sources.
From pySpark, it work for me :
dataframe_mysql = mySqlContext.read.format("jdbc").options( url="jdbc:mysql://localhost:3306/my_bd_name", driver = "com.mysql.jdbc.Driver", dbtable = "my_tablename", user="root", password="root").load()
With spark 2.0.x,you can use DataFrameReader and DataFrameWriter. Use SparkSession.read to access DataFrameReader and use Dataset.write to access DataFrameWriter.
Suppose using spark-shell.
val prop=new java.util.Properties() prop.put("user","username") prop.put("password","yourpassword") val url="jdbc:mysql://host:port/db_name" val df=spark.read.jdbc(url,"table_name",prop) df.show()
val jdbcDF = spark.read .format("jdbc") .option("url", "jdbc:mysql:dbserver") .option("dbtable", "schema.tablename") .option("user", "username") .option("password", "password") .load()
from spark doc
If you want to read data from a query result rather than a table.
val sql="""select * from db.your_table where id>1""" val jdbcDF = spark.read .format("jdbc") .option("url", "jdbc:mysql:dbserver") .option("dbtable", s"( $sql ) t") .option("user", "username") .option("password", "password") .load()
import org.apache.spark.sql.SaveMode val prop=new java.util.Properties() prop.put("user","username") prop.put("password","yourpassword") val url="jdbc:mysql://host:port/db_name" //df is a dataframe contains the data which you want to write. df.write.mode(SaveMode.Append).jdbc(url,"table_name",prop)
中文版戳我
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With