Is there a way to use intersect without selecting distinct values only? Something like INTERSECT ALL
.
For example, consider table A and B
A --> 1, 1, 1, 2, 3, 4
B --> 1, 1, 2
Would result in
Result --> 1, 1, 2
EDIT
I think this link explains well what I want. This other link is also intersting to understand the question. Or this other link explains event better.
EDIT 2
Suppose the tables:
Table A
╔════════╦════╦═══╦════╦════╗
║ A ║ B ║ C ║ D ║ E ║
╠════════╬════╬═══╬════╬════╣
║ Car ║ 10 ║ 1 ║ OK ║ -1 ║
║ Car ║ 10 ║ 1 ║ OK ║ -1 ║
║ Car ║ 10 ║ 1 ║ OK ║ -1 ║
║ House ║ 10 ║ 1 ║ NO ║ -5 ║
║ Monkey ║ 15 ║ 1 ║ OK ║ -1 ║
║ Dog ║ 3 ║ 1 ║ OK ║ -1 ║
╚════════╩════╩═══╩════╩════╝
Table B
╔═════╦════╦═══╦════╦════╗
║ A ║ B ║ C ║ D ║ E ║
╠═════╬════╬═══╬════╬════╣
║ Car ║ 10 ║ 1 ║ OK ║ -1 ║
║ Car ║ 10 ║ 1 ║ OK ║ -1 ║
║ Car ║ 15 ║ 1 ║ OK ║ -1 ║
║ Dog ║ 3 ║ 1 ║ OK ║ -1 ║
╚═════╩════╩═══╩════╩════╝
The answer for intersect (select * from A INTERSECT select * from B
) would be:
╔═════╦════╦═══╦════╦════╗
║ A ║ B ║ C ║ D ║ E ║
╠═════╬════╬═══╬════╬════╣
║ Car ║ 10 ║ 1 ║ OK ║ -1 ║
║ Dog ║ 3 ║ 1 ║ OK ║ -1 ║
╚═════╩════╩═══╩════╩════╝
Because it takes only distinct values. What I want is taking common rows, just like:
╔═════╦════╦═══╦════╦════╗
║ A ║ B ║ C ║ D ║ E ║
╠═════╬════╬═══╬════╬════╣
║ Car ║ 10 ║ 1 ║ OK ║ -1 ║
║ Car ║ 10 ║ 1 ║ OK ║ -1 ║
║ Dog ║ 3 ║ 1 ║ OK ║ -1 ║
╚═════╩════╩═══╩════╩════╝
Observe I don't need to know what I have to link (the connection is positional, just like INTERSECT
). The ID would be something constructed using all columns (the link between table are all columns, based on their position).
The SQL Server (Transact-SQL) INTERSECT operator is used to return the records that are in common between two SELECT statements or data sets. If a record exists in one query and not in the other, it will be omitted from the INTERSECT results. It is the intersection of the two SELECT statements.
The difference between UNION and INTERSECT is that UNION gets results from both queries and combines them, while INTERSECT gets results that only exist in both queries. So, if Query 1 returns records A and B, and Query 2 returns records B and C, UNION would return A, B and C. INTERSECT would only return B.
They are very different, even in your case. The INNER JOIN will return duplicates, if id is duplicated in either table. INTERSECT removes duplicates. The INNER JOIN will never return NULL , but INTERSECT will return NULL .
INTERSECT compares the data between tables and returns only the rows of data that exist in both tables. MINUS compares the data between tables and returns the rows of data that exist only in the first table you specify.
In SQL Server, INTERSECT
works on distinct rows only. If you want it to distinguish between duplicate rows, you will need to make the rows distinct. The only way to do so I can think of is to add another column and populate it with unique values per duplicate, but in such a way that the resulting rows would be matchable across different tables.
The problem, however, is that so far there is no universal syntax for that. For instance, you could use ROW_NUMBER() to enumerate every duplicate, but you would have to write out its PARTITION BY clause for every case individually: there is no PARTITION BY *
, not in SQL Server at least.
Anyway, for the purpose of illustration, here is how the ROW_NUMBER method would look like:
SELECT
A, B, C, D, E,
ROW_NUMBER() OVER (PARTITION BY A, B, C, D, E ORDER BY (SELECT 1))
FROM
dbo.A
INTERSECT
SELECT
A, B, C, D, E,
ROW_NUMBER() OVER (PARTITION BY A, B, C, D, E ORDER BY (SELECT 1))
FROM
dbo.B
;
As written above, the query would also return an extra column, the row number column, in the output. If you wanted to suppress it, you would need to make the query more complex:
SELECT
A, B, C, D, E
FROM
(
SELECT
A, B, C, D, E,
rn = ROW_NUMBER() OVER (PARTITION BY A, B, C, D, E ORDER BY (SELECT 1))
FROM
dbo.A
INTERSECT
SELECT
A, B, C, D, E,
rn = ROW_NUMBER() OVER (PARTITION BY A, B, C, D, E ORDER BY (SELECT 1))
FROM
dbo.B
) AS s
;
And just to clarify, when I said above there was no universal syntax, I meant you could not do it without resorting to dynamic SQL. With dynamic SQL, a great many things are possible but such a solution would be much more complex and, in my opinion, much less maintainable.
Again, to illustrate the point, this is an example of how you could solve it with dynamic SQL:
DECLARE
@table1 sysname,
@table2 sysname,
@columns nvarchar(max),
@sql nvarchar(max)
;
SET @table1 = 'dbo.A';
SET @table2 = 'dbo.B';
-- collecting the columns from one table only,
-- assuming the structures of both tables are identical
-- if the structures differ, declare and populate
-- @columns1 and @columns2 separately
SET @columns = STUFF(
(
SELECT
N', ' + QUOTENAME(name)
FROM
sys.columns
WHERE
object_id = OBJECT_ID(@table1)
FOR XML
PATH (''), TYPE
).value('text()[1]', 'nvarchar(max)'),
1,
2,
''
);
SET @sql =
N'SELECT ' + @columns + N'
FROM
(
SELECT
' + @columns + N',
ROW_NUMBER() OVER (PARTITION BY ' + @columns + N' ORDER BY (SELECT 1))
FROM
' + @table1 + N'
INTERSECT
SELECT
' + @columns + N',
ROW_NUMBER() OVER (PARTITION BY ' + @columns + N' ORDER BY (SELECT 1))
FROM
' + @table2 + N'
) AS s
';
EXECUTE sp_executesql @sql;
You can probably see now what I meant by "much more complex" at least.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With