Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark SQL broadcast hint intermediate tables

I have a problem using Broadcast hints (maybe is some lack of SQL knowledge).

I have a query like

SELECT * /* broadcast(a) */
FROM a 
INNER JOIN b
ON ....
INNER JOIN c
on ....

I would like to do

SELECT * /* broadcast(a) */
FROM a 
INNER JOIN b 
ON ....
INNER JOIN c /* broadcast(AjoinedwithB) */
on ....

I mean, I want to force broadcast join (I would prefer to avoid changing spark parameters to force it everywhere), but I don't know how to refer to the table named AjoinedwithB

Of course I can split the SQL, work with DF API and such... but I would like to do it in a single SQL Query.

like image 907
BiS Avatar asked Nov 01 '25 14:11

BiS


1 Answers

You can use either subquery

SELECT /*+ broadcast(a_b) */ *
FROM 
    (SELECT /*+ broadcast(a) */ * FROM a JOIN b ON ...) AS a_b 
    JOIN c ON ...

or CTE:

WITH a_b AS (SELECT /*+ broadcast(a) */ * FROM a JOIN b ON ...)
SELECT /*+ broadcast(a_b) */ * FROM a_b JOIN c ON ...
like image 62
user10938362 Avatar answered Nov 04 '25 02:11

user10938362