Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is semi-join in database?

I am having trouble while trying to understand the concept of semi-join and how it is different from conventional join. I have tried some article already but not satisfied with the explanation, could someone please help me to understand it?

like image 294
Henu Avatar asked Feb 15 '17 12:02

Henu


People also ask

What is a semi join?

Semijoins are U-SQL's way filter a rowset based on the inclusion of its rows in another rowset. Other SQL dialects express this with the SELECT * FROM A WHERE A. key IN (SELECT B. key FROM B) pattern. There are two variants: LEFT SEMIJOIN and RIGHT SEMIJOIN .

What is semi join with example?

Semijoin is a technique for processing a join between two tables that are stored sites. The basic idea is to reduce the transfer cost by first sending only the projected join column(s) to the other site, where it is joined with the second relation.

What is semi join and anti join?

An anti-join is essentially the opposite of a semi-join: While a semi-join returns one copy of each row in the first table for which at least one match is found, an anti-join returns one copy of each row in the first table for which no match is found.

Is Semi join same as inner join?

A semi join differs from an inner join because an inner join will return one row of x for each matching row of y, where a semi join will never duplicate rows of x. This is a filtering join.


2 Answers

Simple example. Let's select students with grades using left outer join:

SELECT DISTINCT s.id
FROM  students s
      LEFT JOIN grades g ON g.student_id = s.id
WHERE g.student_id IS NOT NULL

Now the same with left semi-join:

SELECT s.id
FROM  students s
WHERE EXISTS (SELECT 1 FROM grades g
              WHERE g.student_id = s.id)

The latter is generally more efficient (depending on concrete DBMS and query optimizer).

like image 52
Iurii Ant Avatar answered Oct 14 '22 05:10

Iurii Ant


As far as I know SQL dialects that support SEMIJOIN/ANTISEMI are U-SQL/Cloudera Impala.

SEMIJOIN:

Semijoins are U-SQL’s way filter a rowset based on the inclusion of its rows in another rowset. Other SQL dialects express this with the SELECT * FROM A WHERE A.key IN (SELECT B.key FROM B) pattern.

More info Semi Join and Anti Join Should Have Their Own Syntax in SQL:

“Semi” means that we don’t really join the right hand side, we only check if a join would yield results for any given tuple.

-- IN
SELECT *
FROM Employee
WHERE DeptName IN (
  SELECT DeptName
  FROM Dept
)

-- EXISTS
SELECT *
FROM Employee
WHERE EXISTS (
  SELECT 1
  FROM Dept
  WHERE Employee.DeptName = Dept.DeptName
)

EDIT:

Another dialect that supports SEMI/ANTISEMI join is KQL:

kind=leftsemi (or kind=rightsemi)

Returns all the records from the left side that have matches from the right. The result table contains columns from the left side only.

let t1 = datatable(key:long, value:string)  
[1, "a",  
2, "b",
3, "c"];
let t2 = datatable(key:long)
[1,3];
t1 | join kind=leftsemi (t2) on key

demo

Output:

key  value
1    a
3    c
like image 16
Lukasz Szozda Avatar answered Oct 14 '22 07:10

Lukasz Szozda