Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which One is faster? Spark SQL with Where clause or Use of Filter in Dataframe after Spark SQL

Which One is faster? Spark SQL with Where clause or Use of Filter in Dataframe after Spark SQL?

Like Select col1, col2 from tab 1 where col1=val;

Or

dataframe df=sqlContext.sql(Select col1, col2 from tab 1);

df.filter("Col1=Val");

like image 501
Bipul Debnath Avatar asked Nov 07 '16 12:11

Bipul Debnath


1 Answers

Using explain method to see the physical plan is a good way to determine performance.

For example, the Zeppelin Tutorial notebook.

sqlContext.sql("select age, job from bank").filter("age = 30").explain

And

sqlContext.sql("select age, job from bank where age = 30").explain

Has exactly the same physical plan.

== Physical Plan ==
Project [age#5,job#6]
+- Filter (age#5 = 30)
   +- Scan ExistingRDD[age#5,job#6,marital#7,education#8,balance#9]

So the performance shall be the same.

Through I think select age, job from bank where age = 30 is more readable in this case.

like image 141
Rockie Yang Avatar answered Nov 15 '22 09:11

Rockie Yang