Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

INTERSECT in MySQL

People also ask

What is INTERSECT in MySQL?

An INTERSECT query returns the intersection of 2 or more datasets. If a record exists in both data sets, it will be included in the INTERSECT results. However, if a record exists in one data set and not in the other, it will be omitted from the INTERSECT results.

What does INTERSECT in SQL mean?

SQL INTERSECT operator combines two select statements and returns only the dataset that is common in both the statements. To put it simply, it acts as a mathematical intersection.

What does INTERSECT command do?

The SQL INTERSECT clause/operator is used to combine two SELECT statements, but returns rows only from the first SELECT statement that are identical to a row in the second SELECT statement. This means INTERSECT returns only common rows returned by the two SELECT statements.

What can I use instead of INTERSECT in SQL?

AFAIR, MySQL implements INTERSECT through INNER JOIN. No, an INNER JOIN produces a Cartesian product. That means that every permutation of (row-from-first-table, row-from-second-table) is generated. However, with an appropriate WHERE clause this can be used to apply the same logic as INTERSECT would, see my answer.


You can use an inner join to filter for rows that have a matching row in another table:

SELECT DISTINCT records.id 
FROM records
INNER JOIN data d1 on d1.id = records.firstname AND data.value = "john"
INNER JOIN data d2 on d2.id = records.lastname AND data.value = "smith"

One of many other alternatives is an in clause:

SELECT DISTINCT records.id 
FROM records
WHERE records.firstname IN (
    select id from data where value = 'john'
) AND records.lastname IN (
    select id from data where value = 'smith'
)

I think this method is much easier to follow, but there is a bit of an overhead associated with it because you are loading up lots of duplicate records initially. I use it on a database with about 10000-50000 records and typically intersect about 5 queries and the performance is acceptable.

All you do is "UNION ALL" each of the queries you want to intersect and see which ones you got every time.

SELECT * From (

    (Select data1.* From data1 Inner Join data2 on data1.id=data2.id where data2.something=true)
    Union All
    (Select data1.* From data1 Inner Join data3 on data1.id=data3.id where data3.something=false)

) As tbl GROUP BY tbl.ID HAVING COUNT(*)=2 

So if we get the same record in both queries, it's count will be 2 and the final wrap-around query will include it.


Use joins instead:

SELECT records.id
FROM records
JOIN data AS D1 ON records.firstname = D1.id
JOIN data AS D2 ON records.lastname = D2.id
WHERE D1.value = 'john' and D2.value = 'smith'

Here's some test data:

CREATE TABLE records (id INT NOT NULL, firstname INT NOT NULL, lastname INT NOT NULL);
INSERT INTO records (id, firstname, lastname) VALUES
(1, 1, 1),
(2, 1, 2),
(3, 2, 1),
(4, 2, 2);

CREATE TABLE data (id INT NOT NULL, value NVARCHAR(100) NOT NULL);
INSERT INTO data (id, value) VALUES
(1, 'john'),
(2, 'smith');

Expected result:

2

The test data is probably not useful for the poster, but might be useful for voters who want to check solutions to see that they work correctly, or people who want to submit answers so that they can test their own answers.


I'm a little late to the party, but I think the cleanest and best way to fully emulate INTERSECT is:

SELECT * FROM
( SELECT records.id FROM records, data WHERE data.id = records.firstname AND data.value = "john" ) x1
NATURAL JOIN
( SELECT records.id FROM records, data WHERE data.id = records.lastname AND data.value = "smith" ) x2