Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Execute sql queries in Apache Spark

I am very new to Apache Spark.
I have already configured spark 2.0.2 on my local windows machine. I have done with "word count" example with spark.
Now, I have the problem in executing the SQL Queries. I have searched for the same , but not getting proper guidance .

like image 606
rajkumar chilukuri Avatar asked Nov 28 '16 10:11

rajkumar chilukuri


People also ask

How do I run a SQL query on Spark?

In Spark 2.0. 2 we have SparkSession which contains SparkContext instance as well as sqlContext instance. Step 2: Load from the database in your case Mysql. Step 3: Now you can run your SqlQuery just like you do in SqlDatabase.

Can we use SQL queries directly in Spark?

Spark SQL can directly read from multiple sources (files, HDFS, JSON/Parquet files, existing RDDs, Hive, etc.). It ensures the fast execution of existing Hive queries. The image below depicts the performance of Spark SQL when compared to Hadoop. Spark SQL executes up to 100x times faster than Hadoop.

Does Apache spark support SQL?

Spark SQL brings native support for SQL to Spark and streamlines the process of querying data stored both in RDDs (Spark's distributed datasets) and in external sources. Spark SQL conveniently blurs the lines between RDDs and relational tables.

What are Spark SQL query execution phases?

In four phases we use Catalyst's general tree transformation framework: Analysis. Logical Optimization. Physical planning. Code generation.


1 Answers

So you need to do these things to get it done ,

In Spark 2.0.2 we have SparkSession which contains SparkContext instance as well as sqlContext instance.

Hence the steps would be :

Step 1: Create SparkSession

val spark = SparkSession.builder().appName("MyApp").master("local[*]").getOrCreate()

Step 2: Load from the database in your case Mysql.

val loadedData=spark
      .read
      .format("jdbc")
      .option("url", "jdbc:mysql://localhost:3306/mydatabase")
      .option("driver", "com.mysql.jdbc.Driver")
      .option("mytable", "mydatabase")
      .option("user", "root")
      .option("password", "toor")
      .load().createOrReplaceTempView("mytable")

Step 3: Now you can run your SqlQuery just like you do in SqlDatabase.

val dataFrame=spark.sql("Select * from mytable")
dataFrame.show()

P.S: It would be better if you use DataFrame Api's or even better if DataSet Api's , but for those you need to go through the documentation.

Link to Documentation: https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.sql.Dataset

like image 123
Shivansh Avatar answered Sep 21 '22 18:09

Shivansh