Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark.sql.crossJoin.enabled for Spark 2.x

I am using the 'preview' Google DataProc Image 1.1 with Spark 2.0.0. To complete one of my operations I have to complete a cartesian product. Since version 2.0.0 there has been a spark configuration parameter created (spark.sql.cross Join.enabled) that prohibits cartesian products and an Exception is thrown. How can I set spark.sql.crossJoin.enabled=true, preferably by using an initialization action? spark.sql.crossJoin.enabled=true

like image 372
Stijn Avatar asked Aug 17 '16 14:08

Stijn


1 Answers

Spark >= 3.0

spark.sql.crossJoin.enable is true by default (SPARK-28621).

Spark >= 2.1

You can use crossJoin:

df1.crossJoin(df2)

It makes your intention explicit and keeps more conservative configuration in place to protect you from unintended cross joins.

Spark 2.0

SQL properties can be set dynamically on runtime with RuntimeConfig.set method so you should be able to call

spark.conf.set("spark.sql.crossJoin.enabled", true)

whenever you want to explicitly allow Cartesian product.

like image 193
zero323 Avatar answered Oct 18 '22 20:10

zero323