Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Help: Counting Rows in a Single Query With a Nested SELECT

Tags:

sql

I'm looking for a better way to do the following query. I have a table that looks like this:

game_id | home_team_id | away_team_id
1       | 100          | 200
2       | 200          | 300
3       | 200          | 400
4       | 300          | 100
5       | 100          | 400

And I want to write a query that counts the number of home games and away games for each team and outputs the following:

team_id | home_games | away_games
100     | 2          | 1
200     | 2          | 1
300     | 1          | 1
400     | 0          | 2

Right now, I wrote this monstrosity that works, but it's slow (I know it's pulling the entire 2,800 row from the table twice).

SELECT 
  home_team_id as team_id,
  (SELECT count(*) FROM `game` WHERE home_team_id = temp_game.home_team_id) as home_games,
  (SELECT count(*) FROM `game` WHERE home_team_id = temp_game.away_team_id) as away_games
  FROM (SELECT * FROM `game`) as temp_game
  GROUP BY home_team_id

Can a SQL guru help me knock out a better way? I think my problem is that I don't understand how to get a distinct list of the team IDs to throw at the count queries. I bet there's a better way with a better placed, nested SELECT. Thanks in advance!

like image 633
Greg Avatar asked Jan 07 '09 20:01

Greg


People also ask

How do I COUNT rows in SELECT query?

In SQL, you can make a database query and use the COUNT function to get the number of rows for a particular group in the table. Here is the basic syntax: SELECT COUNT(column_name) FROM table_name; COUNT(column_name) will not include NULL values as part of the count.

How do I COUNT rows in SQL with conditions?

The SQL COUNT() function returns the number of rows in a table satisfying the criteria specified in the WHERE clause. It sets the number of rows or non NULL column values. COUNT() returns 0 if there were no matching rows.

Can we use COUNT with group by clause?

The count() function with the GROUP BY clause is used to count the data which were grouped on a particular attribute of the table.


2 Answers

It's cleaner if you have another table team with team_id and team_name.

SELECT team_id, team_name, 
     sum(team_id = home_team_id) as home_games, 
     sum(team_id = away_team_id) as away_games
 FROM game, team
 GROUP BY team_id

What's going on: the no WHERE clause causes a Cartesian Product between the two tables; we group by team_id to get back to one row per team. Now there are all the rows from the game table for each team_id so you need to count them but the SQL count function isn't quite right (it would count all the rows or all the distinct rows). So we say team_id = home_team_id which resolves to 1 or 0 and we use sum to add up the 1's.

The team_name is just because it's geeky to say that 'team 200 had 20 home games' when we ought to say that 'Mud City Stranglers had 20 home games'.

PS. this will work even if there are no games (often a problem in SQL where there is a team with 0 games and that row will not show up because the join fails).

like image 199
Frank Flynn Avatar answered Nov 14 '22 23:11

Frank Flynn


If you want the distinct list of teams, you have to select from the game table twice, unioning the home and the away teams (theoretically, one team could play all its games on the road or at home, if you have logic that prevents that, then you could adjust this query):

select home_team_id as team_id from game union
select away_team_id as team_id from game

The union operator will make sure you only get distinct elements in the return set (unless you use union all)

From there, you can use left outer joins to aggregate your data:

select
    u.team_id, count(h.game_id) as home_games, count(a.game_id) as away_games
from
    (
        select home_team_id as team_id from game union
        select away_team_id as team_id from game
    ) as u
        left outer join game as h on h.home_team_id = u.team_id
        left outer join game as a on a.away_team_id = u.team_id
group by
    u.team_id

If you want to reduce your table scans even further (the above will produce four), you can add more code, but it will cost you. You can get a list of rows with the team_id, and whether or not the game was played at home or away:

select
    case ha.home when 0 then g.away_team_id else g.home_team_id end as team_id,
    case ha.home when 0 then 0 else 1 end as home_games,
    case ha.home when 0 then 1 else 0 end as away_games
from
    game as g, (select 0 as home union select 1 as home) as ha

From there, you can simply sum up the games at home and away for each team:

select
    t.team_id, sum(t.home_games) as home_games, sum(t.away_games) as away_games
from
    (
        select
            case ha.home when 0 then g.away_team_id else g.home_team_id end as team_id,
            case ha.home when 0 then 0 else 1 end as home_games,
            case ha.home when 0 then 1 else 0 end as away_games
        from
            game as g, (select 0 as home union select 1 as home) as ha
    ) as t
group by
    t.team_id

This will result in a single table scan.

like image 41
casperOne Avatar answered Nov 14 '22 21:11

casperOne