I have a PostgreSQL table that is mostly a bridge table but it also has some extra stuff.
Essentially it holds the information about players in a game. So we have a unique id for this instance of a player in a game. Then an id that is FK to game table, and an id that is FK to player table. There is also some other irrelevant stuff. Something like this:
Table players_games
| id | 12564
| player_id | 556
| game_id | 156184
What I want to do is find how many occurrences there are of a player playing with another. So, if player1 is in the same game as player2, they have played together once. There are 2+ players in a game.
So what I want to do is populate a new table, that holds three values: player_lo, player_hi, times_played.
And either have one row for each pair and the number of times they played, or if it ends up being more efficient, a row for each iteration and have the value set as 1 so these can be added together later, maybe distributed. So you might see something like:
p1, p2, 1
p1, p2, 1
And these get reduced later to:
p1, p2, 2
So I was wondering if there was some clever way to do this with SQL, or if there's SQL that can reduce my programming effort, before starting to write a slightly complex python script to do it.
To do this, you need to do a self join on the player_games table. The first subquery is for the first player, and the second for the second player. The "first" player is the one with the lower player id.
select pg1.player_id as player1, pg2.player_id as player2, count(*) as num_games
from (select distinct game_id, player_id
from players_games pg
) pg1 join
(select distinct game_id, player_id
from players_games pg
) pg2
on pg1.game_id = pg2.game_id and
pg1.player_id < pg2.player_id
group by pg1.player_id, pg2.player_id
Note that the join condition uses a "<" on the player ids. This is to prevent counting duplicates (so players A,B are not also counted as B,A).
Also, I added a "distinct" in the inner subqueries just in case a single player might appear more than once for a given game. Perhaps this is not necessary. To be sure, you should have a unique index on the composite key game_id, player_id.
select p1, p2, count(*) from (
select
pg1.player_id as p1, pg1.game_id, pg2.player_id as p2
from
players_games pg1, players_games pg2
where
pg1.game_id = pg2.game_id and pg1.player_id != pg2.player_id
) foo
group by p1, p2
Note that this does a full join on players_games so it can be very slow if the table is large. The key part is the group by for getting the count.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With