Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are the join types defined as constants somewhere accessible in Apache Spark?

I haven't found them after having a cursory glance at the Spark codebase. In most documentation and tutorial examples, people seem to be using 'naked' string literals to specify join types. Does Spark provide an object or class defining "leftouter", "inner", "cartesian" etc. as public vals, or is relying on string literals simply the convention?

That is to say, is there an alternative to:

dataframe.join(
  right = anotherDataFrame,
  joinExprs = expr("1 = 1"),
  joinType = "leftouter"
)

that would look something like:

dataframe.join(
  right = anotherDataFrame, 
  joinExprs = expr("1 = 1"),
  joinType = SparkJoins.LeftOuter
)

?

like image 972
Tobias Roland Avatar asked Mar 27 '18 17:03

Tobias Roland


1 Answers

You could use the the objects included in the package, for instance, using the following objects for each literal:

  • for "INNER" => org.apache.spark.sql.catalyst.plans.Inner.sql
  • for "LEFT OUTER" => org.apache.spark.sql.catalyst.plans.LeftOuter.sql
  • etc.

source: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala

like image 158
evinhas Avatar answered Oct 11 '22 10:10

evinhas