What is happening when using DISTINCT?

Question

Here is my table and the data contained in it:

Table: first

+----------+------+
| first_id | data |
+----------+------+
|        1 |    5 |
|        2 |    6 |
|        3 |    7 |
|        4 |    6 |
|        5 |    7 |
|        6 |    5 |
|        7 |    7 |
|        8 |    6 |
|        9 |    5 |
|       10 |    7 |
+----------+------+

Table: second
+-----------+----------+----------+
| second_id | first_id | third_id |
+-----------+----------+----------+
|         1 |        1 |        2 |
|         2 |        2 |        3 |
|         3 |        3 |        4 |
|         4 |        4 |        2 |
|         5 |        5 |        3 |
|         6 |        6 |        4 |
|         7 |        7 |        2 |
|         8 |        8 |        2 |
|         9 |        9 |        4 |
|        10 |       10 |        4 |
+-----------+----------+----------+

My intention is to get the list of third_ids ordered by data field. Now, I ran the following query for that.

SELECT
    third_id, data
FROM 
    first f JOIN second s ON ( s.first_id = f.first_id )
ORDER BY 
    data ASC;

And I get the following result as expected.

+----------+------+
| third_id | data |
+----------+------+
|        4 |    5 |
|        2 |    5 |
|        4 |    5 |
|        2 |    6 |
|        3 |    6 |
|        2 |    6 |
|        2 |    7 |
|        4 |    7 |
|        4 |    7 |
|        3 |    7 |
+----------+------+

The following query is also work as expected.

SELECT 
    third_id
FROM 
    first f JOIN second s ON ( s.first_id = f.first_id )
ORDER BY 
    data ASC;

with output

+----------+
| third_id |
+----------+
|        4 |
|        2 |
|        4 |
|        2 |
|        3 |
|        2 |
|        2 |
|        4 |
|        4 |
|        3 |
+----------+

Then I ran the following.

SELECT DISTINCT
    third_id
FROM 
    first f JOIN second s ON ( s.first_id = f.first_id )
ORDER BY 
    data ASC;

But, here I get an unexpected result:

+----------+
| third_id |
+----------+
|        2 |
|        3 |
|        4 |
+----------+

Here, 3 must be after 2 and 4, since I am ordering on the data field. What am I doing wrong? Or do I have to go for a different strategy.

Note: This scenario happens on my project. The tables provided here doesn't belong to original database. It is created by me to explain the problem. Original tables contain thousands of rows. I am inserting database dump if you would like to experiment with the data:

--
-- Table structure for table `first`
--

CREATE TABLE IF NOT EXISTS `first` (
  `first_id` int(11) NOT NULL AUTO_INCREMENT,
  `data` int(11) NOT NULL,
  PRIMARY KEY (`first_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=11 ;

--
-- Dumping data for table `first`
--

INSERT INTO `first` (`first_id`, `data`) VALUES
(1, 5),
(2, 6),
(3, 7),
(4, 6),
(5, 7),
(6, 5),
(7, 7),
(8, 6),
(9, 5),
(10, 7);
--
-- Table structure for table `second`
--

CREATE TABLE IF NOT EXISTS `second` (
  `second_id` int(11) NOT NULL AUTO_INCREMENT,
  `first_id` int(11) NOT NULL,
  `third_id` int(11) NOT NULL,
  PRIMARY KEY (`second_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=11 ;

--
-- Dumping data for table `second`
--

INSERT INTO `second` (`second_id`, `first_id`, `third_id`) VALUES
(1, 1, 2),
(2, 2, 3),
(3, 3, 4),
(4, 4, 2),
(5, 5, 3),
(6, 6, 4),
(7, 7, 2),
(8, 8, 2),
(9, 9, 4),
(10, 10, 4);

Michael Krelin - hacker · Accepted Answer

You probably want to do something like

SELECT third_id
FROM first JOIN second USING (first_id)
GROUP BY third_id
ORDER BY aggregatesomething(data)

that is min(data) or max(data) or whatever.

satnhak · Answer

Doing a SELECT DISTINCT requires the database to order the values in the column(s) as that is the most efficient way to find the distinct values. As far as I'm aware ORDER BY clauses that do not contain columns that are outputted in the query do not get honoured (SQL SERVER won't accept the query) as it is not clear what it would mean to order by something that did not participate.

Devart · Answer

You may use a subquery -

SELECT DISTINCT third_id FROM (
  SELECT
    third_id
  FROM 
    first f JOIN second s ON ( s.first_id = f.first_id )
  ORDER BY 
    data ASC
) t;

It will help to select and sort all data firstly, then to select distinct values.

What is happening when using DISTINCT?

Tags:

sql

select

mysql

distinct

Jomoos

3 Answers

Michael Krelin - hacker

satnhak

Devart

Recent Activity

Donate For Us

What is happening when using DISTINCT?

Tags:

sql

select

mysql

distinct

Jomoos

3 Answers

Michael Krelin - hacker

satnhak

Devart

Related questions

Recent Activity

Donate For Us