Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

which is faster spark.sql or df.filter("").select("") . using scala

I have a table that is a df which also has a view created with

table.createOrReplaceTempView("table")

and the query is

spark.sql("SELECT column1 from TABLE where column2 = 'VALUE'")

i want to rewrite the query as

TABLE.filter(TABLE("column2") === "value").select(col("column1"))

so which query is much faster the normal spark.sql or with filter and select ? when using large datasets.


2 Answers

I assume that if their physical execution plan is exactly the same, performance will be the same as well. So let's do a test, on Spark 2.2.0:

scala> import spark.implicits._
import spark.implicits._

scala> case class Record(column1: String, column2: String)
defined class Record

scala> val table = List(Record("foo", "value"), Record("bar", "notvalue")).toDF
table: org.apache.spark.sql.DataFrame = [column1: string, column2: string]

scala> table.createOrReplaceTempView("table")

scala> val a = spark.sql("SELECT column1 from TABLE where column2 = 'value'")
a: org.apache.spark.sql.DataFrame = [column1: string]

scala> val b = table.filter(table("column2") === "value").select(col("column1")) 
b: org.apache.spark.sql.DataFrame = [column1: string]

scala> a.explain()
== Physical Plan ==
*Project [column1#41]
+- *Filter (isnotnull(column2#42) && (column2#42 = value))
   +- LocalTableScan [column1#41, column2#42]

scala> b.explain()
== Physical Plan ==
*Project [column1#41]
+- *Filter (isnotnull(column2#42) && (column2#42 = value))
   +- LocalTableScan [column1#41, column2#42]

Looks like there's no difference at all...

like image 56
Jasper-M Avatar answered Sep 13 '25 13:09

Jasper-M


Depends on your use case just try both of them which works fast is the best suit for you !

I would recommend you to use

1.spark.time(df.filter(“”).select(“”)) 

2.spark.time(spark.sql("")) 

you can print out the time and use the one which takes the least time to execute in your code to run it faster.

like image 40
Sundeep Avatar answered Sep 13 '25 12:09

Sundeep