Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complex SQL query with multiple tables and relations

Tags:

sql

postgresql

In this Query, I have to list pair of players with their playerID and playerName who play for the exact same teams.If a player plays for 3 teams, the other has to play for exact same 3 teams. No less, no more. If two players currently do not play for any team, they should also be included. The query should return (playerID1, playername1, playerID2, playerName2) with no repetition such as if player 1 info comes before player 2, there should not be another tuple with player 2 info coming before player 1.

For example if player A plays for yankees and redsox, and player b plays for Yankees, Red Sox, and Dodgers I should not get them. They both have to play for Yankees, and Red Sox and no one else. Right now this query finds answer if players play for any same team.

Tables:
player(playerID: integer, playerName: string)
team(teamID: integer, teamName: string, sport: string)
plays(playerID: integer, teamID: integer)

Example data:
PLAYER    
playerID    playerName
1           Rondo
2           Allen
3           Pierce
4           Garnett
5           Perkins

TEAM      
teamID     teamName       sport
1          Celtics        Basketball
2          Lakers         Basketball
3          Patriots       Football
4          Red Sox        Baseball
5          Bulls          Basketball

PLAYS
playerID    TeamID
1           1
1           2
1           3
2           1
2           3
3           1
3           3

So I should get this as answer-

 2, Allen, 3, Pierce 
 4, Garnett, 5, Perkins

.

2, Allen, 3 Pierce is an snwer because both play for exclusively CELTICS and PATRIOTS 4, Garnett, 5, Perkins iss an answer because both players play for no teams which should be in output.

Right now the Query I have is

SELECT p1.PLAYERID, 
       f1.PLAYERNAME, 
       p2.PLAYERID, 
       f2.PLAYERNAME 
FROM   PLAYER f1, 
       PLAYER f2, 
       PLAYS p1 
       FULL OUTER JOIN PLAYS p2 
                    ON p1.PLAYERID < p2.PLAYERID 
                       AND p1.TEAMID = p2.TEAMID 
GROUP  BY p1.PLAYERID, 
          f1.PLAYERID, 
          p2.PLAYERID, 
          f2.PLAYERID 
HAVING Count(p1.PLAYERID) = Count(*) 
       AND Count(p2.PLAYERID) = Count(*) 
       AND p1.PLAYERID = f1.PLAYERID 
       AND p2.PLAYERID = f2.PLAYERID; 

I am not 100% sure but I think this finds players who play for the same team but I want to find out players who play for the exclusively all same TEAMS as explained above

I am stuck on how to approach it after this. Any hints on how to approach this problem. Thanks for your time.

like image 307
user2632133 Avatar asked Jul 31 '13 06:07

user2632133


2 Answers

I believe this query will do what you want:

SELECT array_agg(players), player_teams
FROM (
  SELECT DISTINCT t1.t1player AS players, t1.player_teams
  FROM (
    SELECT
      p.playerid AS t1id,
      concat(p.playerid,':', p.playername, ' ') AS t1player,
      array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
    FROM player p
    LEFT JOIN plays pl ON p.playerid = pl.playerid
    GROUP BY p.playerid, p.playername
  ) t1
INNER JOIN (
  SELECT
    p.playerid AS t2id,
    array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
  FROM player p
  LEFT JOIN plays pl ON p.playerid = pl.playerid
  GROUP BY p.playerid, p.playername
) t2 ON t1.player_teams=t2.player_teams AND t1.t1id <> t2.t2id
) innerQuery
GROUP BY player_teams


Result:
PLAYERS               PLAYER_TEAMS
2:Allen,3:Pierce      1,3
4:Garnett,5:Perkins

It uses array_agg over the teamid for each player in plays to match players with the exact same team configuration. I Included a column with the teams for example, but that can be removed without affecting the results as long as it isn't removed from the group by clause.

SQL Fiddle example.Tested with Postgesql 9.2.4

EDIT: Fixed an error that duplicated rows.

like image 108
jpw Avatar answered Nov 15 '22 04:11

jpw


Seems that OP probably won't be interested anymore, but in case somebody else finds it useful, this is query in pure SQL that works (for me at least ;))

SELECT M.p1, pr1.playername, M.p2, pr2.playername FROM player pr1 
INNER JOIN player pr2 INNER JOIN
(
   SELECT plays1.player p1, plays2.player p2, plays1.team t1 FROM plays plays1 
   INNER JOIN plays plays2 
   ON (plays1.player < plays2.player AND plays1.team = plays2.team)
   GROUP BY plays1.player, plays2.player HAVING COUNT(*) = 
((SELECT COUNT(*) FROM plays plays3 WHERE plays3.player = plays1.player) + 
(SELECT COUNT(*) FROM plays plays4 WHERE plays4.player = plays2.player)) /2
) M ON pr1.playerID = M.p1 AND pr2.playerID = M.p2 
UNION ALL
SELECT M.pid, M.pname, N.pid2, N.pname2 FROM
(
(SELECT p.playerID pid, p.playerName pname, pl.team FROM player p
 LEFT JOIN plays pl ON p.playerId = pl.player WHERE pl.team IS NULL) M
 INNER JOIN
 (SELECT p.playerID pid2, p.playerName pname2, pl.team FROM player p
  LEFT JOIN plays pl ON p.playerId = pl.player WHERE pl.team IS NULL) N 
 ON (pid < pid2)
)
like image 35
janek Avatar answered Nov 15 '22 04:11

janek