Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select distinct combinations from two columns

Tags:

sql

mysql

I have two columns, source and destination in table Hyperlink, to store the source and destination of hyperlinks.

source | destination 
-------------------- 
  a    |  b 
  b    |  c 
  c    |  d 
  c    |  b 

There are two hyperlinks involving both b and c. The difference between the two hyperlinks is the direction of the hyperlink. However, my objective is to retrieve unique hyperlinks, no matter which direction. So for hyperlinks such as from b to c and from c to b, I just want to select one of them. Any one would do.

So my results should look like this:

source | destination 
-------------------- 
  a    |  b 
  b    |  c 
  c    |  d 

So far I am able to implement this in Java, with some processing before I execute SQL statements using JDBC. However, this is going to be very tedious when the table becomes very large.

I wonder if there is anyway I can do this in SQL instead.

I tried SELECT DISTINCT source,destination FROM Hyperlink but it returns me the unique permutations. I need the unique combinations.

Thanks!

like image 646
paperclip Avatar asked Jul 29 '12 08:07

paperclip


3 Answers

This is easily achievable with the least() and greatest() operator, but as MySQL doesn't support them you need to use a CASE construct to get the smaller/greater one. With two columns this is ok, but this solution gets pretty messy once more columns are involved

select distinct 
          case 
            when source < destination then source 
            else destination 
          end as source,
          case 
            when source > destination then source 
            else destination 
          end as destination
from hyperlinks
like image 90
a_horse_with_no_name Avatar answered Oct 19 '22 13:10

a_horse_with_no_name


Try the following query:

SELECT DISTINCT source, destination FROM hyperlink
MINUS 
SELECT destination, source FROM hyperlinks WHERE source < destination;

This works for Oracle . If you're using PostgreSQL, DB2 or TSQL, use the EXCEPT keyword instead of MINUS.

EDIT: There's no equivalent of these keywords in MySQL. You'll have to work around it by selecting the values as suggested by Jim Riordan. I'm not going to delete my answer in case if anyone needs to do it in any of the other four major DBMS.

like image 34
toniedzwiedz Avatar answered Oct 19 '22 12:10

toniedzwiedz


You can use the union of two separate join queries like so:

SELECT
lhs.source, lhs.destination
FROM Hyperlink lhs
LEFT OUTER JOIN Hyperlink rhs
ON rhs.source = lhs.destination
WHERE rhs.source IS NULL
UNION
SELECT
lhs.source, lhs.destination
FROM Hyperlink lhs
JOIN Hyperlink rhs
ON rhs.source = lhs.destination
WHERE rhs.destination <> lhs.source
ORDER BY source;

The first query gets the links that don't have the source as the destination, the second gets the matches that have source as the destination, but different opposites. It's probably not the fastest implementation but ensuring you have indexes on the source and destination columns will help it along, whether it will be performant for you depends how big the Hyperlink table is or is likely to get.

like image 1
Jim Riordan Avatar answered Oct 19 '22 12:10

Jim Riordan