Is there a way to use broadcast in Spark SQL statement?
For example:
SELECT
Column
FROM
broadcast (Table 1)
JOIN
Table 2
ON
Table1.key = Table2.key
And in my case, Table 1 is also a sub query.
Below is the syntax for Broadcast join:
SELECT /*+ BROADCAST(Table 2) */ COLUMN
FROM Table 1 join Table 2
on Table1.key= Table2.key
To check if broadcast join occurs or not you can check in Spark UI port number 18080 in the SQL tab.
The reason we need to ensure whether broadcast join is actually working is because earlier we were using the below syntax: /* BROADCASTJOIN(Table2) */ which did not throw syntax error but in the UI it was performing sort merge join
Hence it is essential to ensure our joins are working as expected
In Spark 2.2 or later you can use planner hints:
SELECT /*+ MAPJOIN(Table1) */ COLUMN
FROM Table1 JOIN Table2
ON Table1.key = Table2.key
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With