Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to integrate Apache Spark with MySQL for reading database tables as a spark dataframe? [closed]

I want to run my existing application with Apache Spark and MySQL.

like image 460
pangkaj paul Avatar asked Dec 31 '14 08:12

pangkaj paul


People also ask

Can Spark read from MySQL?

The idea is simple: Spark can read MySQL data via JDBC and can also execute SQL queries, so we can connect it directly to MySQL and run the queries. Why is this faster? For long-running (i.e., reporting or BI) queries, it can be much faster as Spark is a massively parallel system.

How does Spark read data from database?

In spark, we can pass read format as “jdbc” with database url, username and password to read same table. We can notice that we are getting the same number of row as count when we have data frame based on employees table.

Can Spark SQL read data from other databases?

Spark SQL also includes a data source that can read data from other databases using JDBC. This functionality should be preferred over using JdbcRDD. This is because the results are returned as a DataFrame and they can easily be processed in Spark SQL or joined with other data sources.


2 Answers

From pySpark, it work for me :

dataframe_mysql = mySqlContext.read.format("jdbc").options(     url="jdbc:mysql://localhost:3306/my_bd_name",     driver = "com.mysql.jdbc.Driver",     dbtable = "my_tablename",     user="root",     password="root").load() 
like image 150
cherah30 Avatar answered Sep 21 '22 17:09

cherah30


With spark 2.0.x,you can use DataFrameReader and DataFrameWriter. Use SparkSession.read to access DataFrameReader and use Dataset.write to access DataFrameWriter.

Suppose using spark-shell.

read example

val prop=new java.util.Properties() prop.put("user","username") prop.put("password","yourpassword") val url="jdbc:mysql://host:port/db_name"  val df=spark.read.jdbc(url,"table_name",prop)  df.show() 

read example 2

val jdbcDF = spark.read   .format("jdbc")   .option("url", "jdbc:mysql:dbserver")   .option("dbtable", "schema.tablename")   .option("user", "username")   .option("password", "password")   .load() 

from spark doc

read example3

If you want to read data from a query result rather than a table.

val sql="""select * from db.your_table where id>1""" val jdbcDF = spark.read   .format("jdbc")   .option("url", "jdbc:mysql:dbserver")   .option("dbtable",  s"( $sql ) t")   .option("user", "username")   .option("password", "password")   .load() 

write example

import org.apache.spark.sql.SaveMode  val prop=new java.util.Properties() prop.put("user","username") prop.put("password","yourpassword") val url="jdbc:mysql://host:port/db_name" //df is a dataframe contains the data which you want to write. df.write.mode(SaveMode.Append).jdbc(url,"table_name",prop) 

中文版戳我

like image 24
Liam Avatar answered Sep 22 '22 17:09

Liam