Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is happening when using DISTINCT?

Here is my table and the data contained in it:

Table: first

+----------+------+
| first_id | data |
+----------+------+
|        1 |    5 |
|        2 |    6 |
|        3 |    7 |
|        4 |    6 |
|        5 |    7 |
|        6 |    5 |
|        7 |    7 |
|        8 |    6 |
|        9 |    5 |
|       10 |    7 |
+----------+------+

Table: second
+-----------+----------+----------+
| second_id | first_id | third_id |
+-----------+----------+----------+
|         1 |        1 |        2 |
|         2 |        2 |        3 |
|         3 |        3 |        4 |
|         4 |        4 |        2 |
|         5 |        5 |        3 |
|         6 |        6 |        4 |
|         7 |        7 |        2 |
|         8 |        8 |        2 |
|         9 |        9 |        4 |
|        10 |       10 |        4 |
+-----------+----------+----------+

My intention is to get the list of third_ids ordered by data field. Now, I ran the following query for that.

SELECT
    third_id, data
FROM 
    first f JOIN second s ON ( s.first_id = f.first_id )
ORDER BY 
    data ASC;

And I get the following result as expected.

+----------+------+
| third_id | data |
+----------+------+
|        4 |    5 |
|        2 |    5 |
|        4 |    5 |
|        2 |    6 |
|        3 |    6 |
|        2 |    6 |
|        2 |    7 |
|        4 |    7 |
|        4 |    7 |
|        3 |    7 |
+----------+------+

The following query is also work as expected.

SELECT 
    third_id
FROM 
    first f JOIN second s ON ( s.first_id = f.first_id )
ORDER BY 
    data ASC;

with output

+----------+
| third_id |
+----------+
|        4 |
|        2 |
|        4 |
|        2 |
|        3 |
|        2 |
|        2 |
|        4 |
|        4 |
|        3 |
+----------+

Then I ran the following.

SELECT DISTINCT
    third_id
FROM 
    first f JOIN second s ON ( s.first_id = f.first_id )
ORDER BY 
    data ASC;

But, here I get an unexpected result:

+----------+
| third_id |
+----------+
|        2 |
|        3 |
|        4 |
+----------+

Here, 3 must be after 2 and 4, since I am ordering on the data field. What am I doing wrong? Or do I have to go for a different strategy.

Note: This scenario happens on my project. The tables provided here doesn't belong to original database. It is created by me to explain the problem. Original tables contain thousands of rows. I am inserting database dump if you would like to experiment with the data:

--
-- Table structure for table `first`
--

CREATE TABLE IF NOT EXISTS `first` (
  `first_id` int(11) NOT NULL AUTO_INCREMENT,
  `data` int(11) NOT NULL,
  PRIMARY KEY (`first_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=11 ;

--
-- Dumping data for table `first`
--

INSERT INTO `first` (`first_id`, `data`) VALUES
(1, 5),
(2, 6),
(3, 7),
(4, 6),
(5, 7),
(6, 5),
(7, 7),
(8, 6),
(9, 5),
(10, 7);
--
-- Table structure for table `second`
--

CREATE TABLE IF NOT EXISTS `second` (
  `second_id` int(11) NOT NULL AUTO_INCREMENT,
  `first_id` int(11) NOT NULL,
  `third_id` int(11) NOT NULL,
  PRIMARY KEY (`second_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=11 ;

--
-- Dumping data for table `second`
--

INSERT INTO `second` (`second_id`, `first_id`, `third_id`) VALUES
(1, 1, 2),
(2, 2, 3),
(3, 3, 4),
(4, 4, 2),
(5, 5, 3),
(6, 6, 4),
(7, 7, 2),
(8, 8, 2),
(9, 9, 4),
(10, 10, 4);
like image 720
Jomoos Avatar asked Apr 10 '12 12:04

Jomoos


3 Answers

You probably want to do something like

SELECT third_id
FROM first JOIN second USING (first_id)
GROUP BY third_id
ORDER BY aggregatesomething(data)

that is min(data) or max(data) or whatever.

like image 111
Michael Krelin - hacker Avatar answered Sep 28 '22 02:09

Michael Krelin - hacker


Doing a SELECT DISTINCT requires the database to order the values in the column(s) as that is the most efficient way to find the distinct values. As far as I'm aware ORDER BY clauses that do not contain columns that are outputted in the query do not get honoured (SQL SERVER won't accept the query) as it is not clear what it would mean to order by something that did not participate.

like image 36
satnhak Avatar answered Sep 28 '22 02:09

satnhak


You may use a subquery -

SELECT DISTINCT third_id FROM (
  SELECT
    third_id
  FROM 
    first f JOIN second s ON ( s.first_id = f.first_id )
  ORDER BY 
    data ASC
) t;

It will help to select and sort all data firstly, then to select distinct values.

like image 30
Devart Avatar answered Sep 28 '22 01:09

Devart