Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GROUP BY does not remove duplicates

Tags:

mysql

group-by

I have a watchlist system that I've coded, in the overview of the users' watchlist, they would see a list of records, however the list shows duplicates when in the database it only shows the exact, correct number.

I've tried GROUP BY watch.watch_id, GROUP BY rec.record_id, none of any types of group I've tried seems to remove duplicates. I'm not sure what I'm doing wrong.

SELECT watch.watch_date,
       rec.street_number,
       rec.street_name,
       rec.city,
       rec.state,
       rec.country,
       usr.username
FROM
(
    watchlist watch 

    LEFT OUTER JOIN records rec ON rec.record_id = watch.record_id

    LEFT OUTER JOIN members usr ON rec.user_id = usr.user_id
)
WHERE  watch.user_id = 1 
GROUP BY watch.watch_id
LIMIT 0, 25

The watchlist table looks like this:

+----------+---------+-----------+------------+
| watch_id | user_id | record_id | watch_date |
+----------+---------+-----------+------------+
|       13 |       1 |        22 | 1314038274 |
|       14 |       1 |        25 | 1314038995 |
+----------+---------+-----------+------------+
like image 514
MacMac Avatar asked Aug 22 '11 19:08

MacMac


2 Answers

GROUP BY does not "remove duplicates". GROUP BY allows for aggregation. If all you want is to combine duplicated rows, use SELECT DISTINCT.

If you need to combine rows that are duplicate in some columns, use GROUP BY but you need to to specify what to do with the other columns. You can either omit them (by not listing them in the SELECT clause) or aggregate them (using functions like SUM, MIN, and AVG). For example:

SELECT watch.watch_id, COUNT(rec.street_number), MAX(watch.watch_date)
... GROUP by watch.watch_id

EDIT

The OP asked for some clarification.

Consider the "view" -- all the data put together by the FROMs and JOINs and the WHEREs -- call that V. There are two things you might want to do.

First, you might have completely duplicate rows that you wish to combine:

a b c
- - -
1 2 3
1 2 3
3 4 5

Then simply use DISTINCT

SELECT DISTINCT * FROM V;

a b c
- - -
1 2 3
3 4 5

Or, you might have partially duplicate rows that you wish to combine:

a b c
- - -
1 2 3
1 2 6
3 4 5

Those first two rows are "the same" in some sense, but clearly different in another sense (in particular, they would not be combined by SELECT DISTINCT). You have to decide how to combine them. You could discard column c as unimportant:

SELECT DISTINCT a,b FROM V;

a b
- -
1 2
3 4

Or you could perform some kind of aggregation on them. You could add them up:

SELECT a,b, SUM(c) "tot" FROM V GROUP BY a,b;

a b tot
- - ---
1 2 9
3 4 5

You could add pick the smallest value:

SELECT a,b, MIN(c) "first" FROM V GROUP BY a,b;

a b first
- - -----
1 2 3
3 4 5

Or you could take the mean (AVG), the standard deviation (STD), and any of a bunch of other functions that take a bunch of values for c and combine them into one.

What isn't really an option is just doing nothing. If you just list the ungrouped columns, the DBMS will either throw an error (Oracle does that -- the right choice, imo) or pick one value more or less at random (MySQL). But as Dr. Peart said, "When you choose not to decide, you still have made a choice."

like image 127
Michael Lorton Avatar answered Sep 18 '22 11:09

Michael Lorton


While SELECT DISTINCT may indeed work in your case, it's important to note why what you have is not working.

You're selecting fields that are outside of the GROUP BY. Although MySQL allows this, the exact rows it returns for the non-GROUP BY fields is undefined.

If you wanted to do this with a GROUP BY try something more like the following:

SELECT watch.watch_date,
       rec.street_number,
       rec.street_name,
       rec.city,
       rec.state,
       rec.country,
       usr.username
FROM
(
    watchlist watch 

    LEFT OUTER JOIN est8_records rec ON rec.record_id = watch.record_id

    LEFT OUTER JOIN est8_members usr ON rec.user_id = usr.user_id
)
WHERE watch.watch_id IN (
SELECT watch_id FROM watch WHERE user_id = 1 
GROUP BY watch.watch_id)
LIMIT 0, 25
like image 20
Jason McCreary Avatar answered Sep 18 '22 11:09

Jason McCreary