Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find movies with highest number of awards in certain year - code duplication

I am trying to write a query (PostgreSQL) to get "Movies with highest number of awards in year 2012."

I have following tables:

CREATE TABLE Award(
    ID_AWARD bigserial CONSTRAINT Award_pk PRIMARY KEY,
    award_name VARCHAR(90),
    category VARCHAR(90),
    award_year integer,
    CONSTRAINT award_unique UNIQUE (award_name, category, award_year));

CREATE TABLE AwardWinner(
    ID_AWARD integer,
    ID_ACTOR integer,
    ID_MOVIE integer,
    CONSTRAINT AwardWinner_pk PRIMARY KEY (ID_AWARD));

And I written following query, which gives correct results, but there's is quite a lot of code duplication I think.

select * from 
(select id_movie, count(id_movie) as awards 
from Award natural join awardwinner 
where award_year = 2012 group by id_movie) as SUB
where awards = (select max(count) from 
(select id_movie, count(id_movie) 
from Award natural join awardwinner 
where award_year = 2012 group by id_movie) as SUB2);

So SUB and SUB2 are exactly the same subquery. Is there a better way to do this?

like image 870
Kuba Spatny Avatar asked Jan 11 '23 21:01

Kuba Spatny


2 Answers

Well you can use common table expression to avoid code duplication:

with cte_s as (
   select id_movie, count(id_movie) as awards
   from Award natural join awardwinner 
   where award_year = 2012
   group by id_movie
)
select
    sub.id_movie, sub.awards
from cte_s as sub
where sub.awards = (select max(sub2.awards) from cte_s as sub2)

or you can do something like this with window function (untested, but I think PostgreSQL allows this):

with cte_s as (
    select
        id_movie,
        count(id_movie) as awards,
        max(count(id_movie)) over() as max_awards
    from Award natural join awardwinner 
    where award_year = 2012
    group by id_movie
)
select id_movie
from cte_s
where max_awards = awards

Another way to do this could be to use rank() function (untested, may be you have to use two cte instead of one):

with cte_s as (
    select
        id_movie,
        count(id_movie) as awards,
        rank() over(order by count(id_movie) desc) as rnk
    from Award natural join awardwinner 
    where award_year = 2012
    group by id_movie
)
select id_movie
from cte_s
where rnk = 1

update When I've created this answer, my main goal was to show how to use cte to avoid code duplication. In genearal, it's better to avoid using cte more than one time in query if it's possible - first query uses 2 table scan (or index seek) and second and third uses only one, so I've should specify that it's better to go with these queries. Anyway, @Erwin made this tests in his answer. Just to add to his great major points:

  • I also advice against natural join because of error-prone nature of this. Actually, my main RDBMS is SQL Server which are not support it so I'm more used to explicit outer/inner join.
  • It's good habit to always use aliases in your queries, so you can avoid strange results.
  • This could be totally subjective thing, but usually if I'm using some table only to filter out rows from main table of the query (like in this query, we just want to get awards for year 2012 and just filter rows from awardwinner), I prefer not to use join, but use exists or in instead, it seems more logical for me.
So final query could be:
with cte_s as (
    select
        aw.id_movie,
        count(*) as awards,
        rank() over(order by count(*) desc) as rnk
    from awardwinner as aw
    where
        exists (
            select *
            from award as a
            where a.id_award = aw.id_award and a.award_year = 2012
        )
    group by aw.id_movie
)
select id_movie
from cte_s
where rnk = 1
like image 143
Roman Pekar Avatar answered Jan 14 '23 09:01

Roman Pekar


Get all winning movies

SELECT id_movie, awards
FROM  (
   SELECT aw.id_movie, count(*) AS awards
         ,rank() OVER (ORDER BY count(aw.id_movie) DESC) AS rnk
   FROM   award       a
   JOIN   awardwinner aw USING (id_award)
   WHERE  a.award_year = 2012
   GROUP  BY aw.id_movie
   ) sub
WHERE  rnk = 1;

Major points

  • This should be simpler and faster than suggestions so far. Test with EXPLAIN ANALYZE.

  • There are cases where CTEs are instrumental to avoid code duplication. But not in this time: a subquery does the job just fine and is usually faster.

  • You can run a window function OVER an aggregate function on the same query level. That's why this works:

    rank() OVER (ORDER BY count(aw.id_movie) DESC) AS rnk
    
  • I'd suggest to use explicit column names in the JOIN condition instead of NATURAL JOIN, which is prone to breakage if you later change / add columns to the underlying tables.
    The JOIN condition with USING is almost as short, but doesn't break as easily.

  • Since id_movie cannot be NULL (ruled out by the JOIN condition and also part of the pk) it is shorter ans slightly faster to use count(*) instead. Same result.

Just one movie

Shorter and faster, yet, if you only need one winner:

SELECT aw.id_movie, count(*) AS awards
FROM   award       a
JOIN   awardwinner aw USING (id_award)
WHERE  a.award_year = 2012
GROUP  BY 1
ORDER  BY 2 DESC, 1 -- as tie breaker
LIMIT  1

Using positional references (1, 2) here as shorthand.
I added id_movie to ORDER BY as tie breaker in case multiple movies should qualify for the win.

like image 27
Erwin Brandstetter Avatar answered Jan 14 '23 11:01

Erwin Brandstetter