Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark SQL statement broadcast

Is there a way to use broadcast in Spark SQL statement?

For example:

SELECT
    Column
FROM
    broadcast (Table 1)
JOIN
    Table 2
ON
    Table1.key = Table2.key

And in my case, Table 1 is also a sub query.

like image 571
HappyHua Avatar asked Aug 04 '17 08:08

HappyHua


2 Answers

Below is the syntax for Broadcast join:

SELECT /*+ BROADCAST(Table 2) */ COLUMN
FROM Table 1 join Table 2
on Table1.key= Table2.key
  1. To check if broadcast join occurs or not you can check in Spark UI port number 18080 in the SQL tab.

  2. The reason we need to ensure whether broadcast join is actually working is because earlier we were using the below syntax: /* BROADCASTJOIN(Table2) */ which did not throw syntax error but in the UI it was performing sort merge join

  3. Hence it is essential to ensure our joins are working as expected

like image 51
Sushmita Konar Avatar answered Sep 19 '22 14:09

Sushmita Konar


In Spark 2.2 or later you can use planner hints:

SELECT  /*+ MAPJOIN(Table1) */ COLUMN
FROM Table1 JOIN Table2
ON Table1.key = Table2.key
like image 44
Alper t. Turker Avatar answered Sep 22 '22 14:09

Alper t. Turker