I have two columns, source and destination in table Hyperlink, to store the source and destination of hyperlinks.
source | destination
--------------------
a | b
b | c
c | d
c | b
There are two hyperlinks involving both b and c. The difference between the two hyperlinks is the direction of the hyperlink. However, my objective is to retrieve unique hyperlinks, no matter which direction. So for hyperlinks such as from b to c and from c to b, I just want to select one of them. Any one would do.
So my results should look like this:
source | destination
--------------------
a | b
b | c
c | d
So far I am able to implement this in Java, with some processing before I execute SQL statements using JDBC. However, this is going to be very tedious when the table becomes very large.
I wonder if there is anyway I can do this in SQL instead.
I tried SELECT DISTINCT source,destination FROM Hyperlink
but it returns me the unique permutations. I need the unique combinations.
Thanks!
This is easily achievable with the least() and greatest() operator, but as MySQL doesn't support them you need to use a CASE construct to get the smaller/greater one. With two columns this is ok, but this solution gets pretty messy once more columns are involved
select distinct
case
when source < destination then source
else destination
end as source,
case
when source > destination then source
else destination
end as destination
from hyperlinks
Try the following query:
SELECT DISTINCT source, destination FROM hyperlink
MINUS
SELECT destination, source FROM hyperlinks WHERE source < destination;
This works for Oracle . If you're using PostgreSQL, DB2 or TSQL, use the EXCEPT keyword instead of MINUS.
EDIT: There's no equivalent of these keywords in MySQL. You'll have to work around it by selecting the values as suggested by Jim Riordan. I'm not going to delete my answer in case if anyone needs to do it in any of the other four major DBMS.
You can use the union of two separate join queries like so:
SELECT
lhs.source, lhs.destination
FROM Hyperlink lhs
LEFT OUTER JOIN Hyperlink rhs
ON rhs.source = lhs.destination
WHERE rhs.source IS NULL
UNION
SELECT
lhs.source, lhs.destination
FROM Hyperlink lhs
JOIN Hyperlink rhs
ON rhs.source = lhs.destination
WHERE rhs.destination <> lhs.source
ORDER BY source;
The first query gets the links that don't have the source as the destination, the second gets the matches that have source as the destination, but different opposites. It's probably not the fastest implementation but ensuring you have indexes on the source and destination columns will help it along, whether it will be performant for you depends how big the Hyperlink table is or is likely to get.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With