I would like to use distinct on the following table, but only on the 'PlayerID' column. This is what I have at the moment:
MATCHID PLAYERID TEAMID MATCHDATE STARTDATE
---------- ---------- ---------- --------- ---------
20 5 2 14-JAN-12 01-JUN-11
20 5 4 14-JAN-12 01-JUN-10
20 7 4 14-JAN-12 01-JUN-11
20 7 2 14-JAN-12 01-JUN-10
20 10 4 14-JAN-12 01-JUN-11
20 11 2 14-JAN-12 01-JUN-10
20 13 2 14-JAN-12 01-JUN-11
20 16 4 14-JAN-12 01-JUN-10
20 17 4 14-JAN-12 01-JUN-10
20 18 4 14-JAN-12 01-JUN-10
20 19 2 14-JAN-12 01-JUN-11
And this is what I want, so that the highest 'StartDate' for each 'PlayerID' is shown and the next row ignored:
MATCHID PLAYERID TEAMID MATCHDATE STARTDATE
---------- ---------- ---------- --------- ---------
20 5 2 14-JAN-12 01-JUN-11
20 7 4 14-JAN-12 01-JUN-11
20 10 4 14-JAN-12 01-JUN-11
20 11 2 14-JAN-12 01-JUN-10
20 13 2 14-JAN-12 01-JUN-11
20 16 4 14-JAN-12 01-JUN-10
20 17 4 14-JAN-12 01-JUN-10
20 18 4 14-JAN-12 01-JUN-10
20 19 2 14-JAN-12 01-JUN-11
Current SQL:
SELECT pi.MatchID, pi.PlayerID, t.TeamID, m.MatchDate, pf.StartDate
FROM Plays_In pi, Match m, Plays_A pa, Team t, Plays_For pf, Made_Up_Of muo, Season s
WHERE pi.MatchID = m.MatchID
AND m.MatchID = pa.MatchID
AND pa.TeamID = t.TeamID
AND pf.PlayerID = pi.PlayerID
AND pf.TeamID = t.TeamID
AND muo.MatchID = pi.MatchID
AND muo.SeasonID = s.SeasonID
AND pi.MatchID = '&match_id'
AND m.MatchDate >= pf.StartDate
ORDER BY pi.MatchID ASC, pi.PlayerID ASC, pf.StartDate DESC;
It's an Oracle database.
Thanks in advance.
A few points...
Unless you're using the joins to Made_Up_Of
and Season
to filter out rows, you don't need these tables. I've left them out here; you can add them back in if you need them.
Mark Tickner is correct that you should use the ANSI JOIN syntax. The nice thing about it (other than being standard) is that it puts the join logic right with the table being joined. Once you get used to it I think you'll find it preferable.
What you're really after is the maximum pf.StartDate
for each PlayerID
, which is a nice fit for the analytical ROW_NUMBER()
function. The PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC
will basically assign the value 1
to the row with each player's most recent sort date. The outer filters out all rows except those with the 1
ranking.
You can also assign rankings with the RANK()
and DENSE_RANK()
analytical functions, but if a player has a tie for the most recent date then all the tied dates will be ranked #1 and you'll get multiple rows for that player. In situations like this where you only want one row per player, use ROW_NUMBER()
instead.
Put it all together and you get this:
SELECT MatchID, PlayerID, TeamID, MatchDte, StartDate FROM (
SELECT
pi.MatchID,
pi.PlayerID,
t.TeamID,
m.MatchDate,
pf.StartDate,
ROW_NUMBER() OVER (PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC) AS StartDateRank
FROM Plays_In pi
INNER JOIN Match m ON pi.MatchID = m.MatchID
INNER JOIN Plays_A pa ON m.MatchID = pa.MatchID
INNER JOIN Team t ON pa.TeamID = t.TeamID
INNER JOIN Plays_For pf ON pf.PlayerID = pi.PlayerID AND pf.TeamID = t.TeamID
WHERE pi.MatchID = '&match_id'
AND m.MatchDate >= pf.StartDate
)
WHERE StartDateRank = 1
ORDER BY MatchID, PlayerID
One final point: based on the WHERE pi.MatchID = '&match_id'
it looks like you may be using PHP as your front end and the mysql
functions to do the query. If so, please look into mysqli
or PDO
instead, as they'll protect you from SQL Injection. The mysql
functions (which are officially deprecated) will not.
Addendum: More information about ROW_NUMBER
, with many thanks to @AndriyM.
With ROW_NUMBER
, if a player has more than one row with the most recent date, only one of the rows will be assigned as ROW_NUMBER = 1
, and that row will be picked more or less randomly. Here's an example, where a player's most recent date is 5/1/2013 and the player has three rows with this date:
pi.MatchID pi.PlayerID pf.StartDate
---------- ----------- ------------
100 1000 05/01/2013 <-- could be ROW_NUMBER = 1
101 1000 04/29/2013
105 1000 05/01/2013 <-- could be ROW_NUMBER = 1
102 1000 05/01/2013 <-- could be ROW_NUMBER = 1
107 1000 04/18/2013
Note that only one of the rows above will be assigned ROW_NUMBER = 1
, and it can be any of them. Oracle will decide, not you.
If this uncertainty is a problem, order by additional columns to get a clear winner. For this example, the highest pi.MatchID
will be used to determine the "true" ROW_NUMBER = 1
:
-- replace `ROW_NUMBER...` in the query above with this:
ROW_NUMBER() OVER (
PARTITION BY pi.PlayerID
ORDER BY pf.StartDate DESC, pi.MatchID DESC) AS StartDateRank
Now if there's a tie for the highest pf.StartDate
, Oracle looks for the highest pi.MatchID
within the subset of rows with the highest pf.StartDate
. As it turns out, only one row satisfies this condition:
pi.MatchID pi.PlayerID pf.StartDate
---------- ----------- ------------
100 1000 05/01/2013
101 1000 04/29/2013
105 1000 05/01/2013 <-- is ROW_NUMBER = 1: highest MatchID for
-- most recent StartDate (5/1/2013)
102 1000 05/01/2013
107 1000 04/18/2013 <-- not considered: has the highest MatchID but isn't
-- in the subset with the most recent StartDate
You could use the rank() function.
SELECT * FROM (
SELECT pi.MatchID, pi.PlayerID, t.TeamID, m.MatchDate, pf.StartDate,
rank() over (partition by pi.PlayerID order by m.MatchDate desc, rowid) as RNK
FROM Plays_In pi, Match m, Plays_A pa, Team t, Plays_For pf, Made_Up_Of muo, Season s
WHERE pi.MatchID = m.MatchID
AND m.MatchID = pa.MatchID
AND pa.TeamID = t.TeamID
AND pf.PlayerID = pi.PlayerID
AND pf.TeamID = t.TeamID
AND muo.MatchID = pi.MatchID
AND muo.SeasonID = s.SeasonID
AND pi.MatchID = '&match_id'
AND m.MatchDate >= pf.StartDate
) WHERE RNK = 1
ORDER BY MatchID ASC, PlayerID ASC, StartDate DESC;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With