I am trying to write a query (PostgreSQL) to get "Movies with highest number of awards in year 2012."
I have following tables:
CREATE TABLE Award(
ID_AWARD bigserial CONSTRAINT Award_pk PRIMARY KEY,
award_name VARCHAR(90),
category VARCHAR(90),
award_year integer,
CONSTRAINT award_unique UNIQUE (award_name, category, award_year));
CREATE TABLE AwardWinner(
ID_AWARD integer,
ID_ACTOR integer,
ID_MOVIE integer,
CONSTRAINT AwardWinner_pk PRIMARY KEY (ID_AWARD));
And I written following query, which gives correct results, but there's is quite a lot of code duplication I think.
select * from
(select id_movie, count(id_movie) as awards
from Award natural join awardwinner
where award_year = 2012 group by id_movie) as SUB
where awards = (select max(count) from
(select id_movie, count(id_movie)
from Award natural join awardwinner
where award_year = 2012 group by id_movie) as SUB2);
So SUB
and SUB2
are exactly the same subquery. Is there a better way to do this?
Well you can use common table expression to avoid code duplication:
with cte_s as (
select id_movie, count(id_movie) as awards
from Award natural join awardwinner
where award_year = 2012
group by id_movie
)
select
sub.id_movie, sub.awards
from cte_s as sub
where sub.awards = (select max(sub2.awards) from cte_s as sub2)
or you can do something like this with window function (untested, but I think PostgreSQL allows this):
with cte_s as (
select
id_movie,
count(id_movie) as awards,
max(count(id_movie)) over() as max_awards
from Award natural join awardwinner
where award_year = 2012
group by id_movie
)
select id_movie
from cte_s
where max_awards = awards
Another way to do this could be to use rank() function (untested, may be you have to use two cte instead of one):
with cte_s as (
select
id_movie,
count(id_movie) as awards,
rank() over(order by count(id_movie) desc) as rnk
from Award natural join awardwinner
where award_year = 2012
group by id_movie
)
select id_movie
from cte_s
where rnk = 1
update When I've created this answer, my main goal was to show how to use cte to avoid code duplication. In genearal, it's better to avoid using cte more than one time in query if it's possible - first query uses 2 table scan (or index seek) and second and third uses only one, so I've should specify that it's better to go with these queries. Anyway, @Erwin made this tests in his answer. Just to add to his great major points:
natural join
because of error-prone nature of this. Actually, my main RDBMS is SQL Server which are not support it so I'm more used to explicit outer/inner join
.awards
for year 2012 and just filter rows from awardwinner
), I prefer not to use join
, but use exists
or in
instead, it seems more logical for me.with cte_s as (
select
aw.id_movie,
count(*) as awards,
rank() over(order by count(*) desc) as rnk
from awardwinner as aw
where
exists (
select *
from award as a
where a.id_award = aw.id_award and a.award_year = 2012
)
group by aw.id_movie
)
select id_movie
from cte_s
where rnk = 1
SELECT id_movie, awards
FROM (
SELECT aw.id_movie, count(*) AS awards
,rank() OVER (ORDER BY count(aw.id_movie) DESC) AS rnk
FROM award a
JOIN awardwinner aw USING (id_award)
WHERE a.award_year = 2012
GROUP BY aw.id_movie
) sub
WHERE rnk = 1;
This should be simpler and faster than suggestions so far. Test with EXPLAIN ANALYZE
.
There are cases where CTEs are instrumental to avoid code duplication. But not in this time: a subquery does the job just fine and is usually faster.
You can run a window function OVER an aggregate function on the same query level. That's why this works:
rank() OVER (ORDER BY count(aw.id_movie) DESC) AS rnk
I'd suggest to use explicit column names in the JOIN condition instead of NATURAL JOIN
, which is prone to breakage if you later change / add columns to the underlying tables.
The JOIN condition with USING
is almost as short, but doesn't break as easily.
Since id_movie
cannot be NULL (ruled out by the JOIN condition and also part of the pk) it is shorter ans slightly faster to use count(*)
instead. Same result.
Shorter and faster, yet, if you only need one winner:
SELECT aw.id_movie, count(*) AS awards
FROM award a
JOIN awardwinner aw USING (id_award)
WHERE a.award_year = 2012
GROUP BY 1
ORDER BY 2 DESC, 1 -- as tie breaker
LIMIT 1
Using positional references (1
, 2
) here as shorthand.
I added id_movie
to ORDER BY
as tie breaker in case multiple movies should qualify for the win.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With