Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distinct on one column only in Oracle

I would like to use distinct on the following table, but only on the 'PlayerID' column. This is what I have at the moment:

   MATCHID   PLAYERID     TEAMID MATCHDATE STARTDATE
---------- ---------- ---------- --------- ---------
        20          5          2 14-JAN-12 01-JUN-11
        20          5          4 14-JAN-12 01-JUN-10
        20          7          4 14-JAN-12 01-JUN-11
        20          7          2 14-JAN-12 01-JUN-10
        20         10          4 14-JAN-12 01-JUN-11
        20         11          2 14-JAN-12 01-JUN-10
        20         13          2 14-JAN-12 01-JUN-11
        20         16          4 14-JAN-12 01-JUN-10
        20         17          4 14-JAN-12 01-JUN-10
        20         18          4 14-JAN-12 01-JUN-10
        20         19          2 14-JAN-12 01-JUN-11

And this is what I want, so that the highest 'StartDate' for each 'PlayerID' is shown and the next row ignored:

   MATCHID   PLAYERID     TEAMID MATCHDATE STARTDATE
---------- ---------- ---------- --------- ---------
        20          5          2 14-JAN-12 01-JUN-11
        20          7          4 14-JAN-12 01-JUN-11
        20         10          4 14-JAN-12 01-JUN-11
        20         11          2 14-JAN-12 01-JUN-10
        20         13          2 14-JAN-12 01-JUN-11
        20         16          4 14-JAN-12 01-JUN-10
        20         17          4 14-JAN-12 01-JUN-10
        20         18          4 14-JAN-12 01-JUN-10
        20         19          2 14-JAN-12 01-JUN-11

Current SQL:

SELECT pi.MatchID, pi.PlayerID, t.TeamID, m.MatchDate, pf.StartDate
FROM Plays_In pi, Match m, Plays_A pa, Team t, Plays_For pf, Made_Up_Of muo, Season s
WHERE pi.MatchID = m.MatchID
AND m.MatchID = pa.MatchID
AND pa.TeamID = t.TeamID
AND pf.PlayerID = pi.PlayerID
AND pf.TeamID = t.TeamID
AND muo.MatchID = pi.MatchID
AND muo.SeasonID = s.SeasonID
AND pi.MatchID = '&match_id'
AND m.MatchDate >= pf.StartDate
ORDER BY pi.MatchID ASC, pi.PlayerID ASC, pf.StartDate DESC;

It's an Oracle database.

Thanks in advance.

like image 442
Mark Tickner Avatar asked May 03 '13 01:05

Mark Tickner


2 Answers

A few points...

  • Unless you're using the joins to Made_Up_Of and Season to filter out rows, you don't need these tables. I've left them out here; you can add them back in if you need them.

  • Mark Tickner is correct that you should use the ANSI JOIN syntax. The nice thing about it (other than being standard) is that it puts the join logic right with the table being joined. Once you get used to it I think you'll find it preferable.

  • What you're really after is the maximum pf.StartDate for each PlayerID, which is a nice fit for the analytical ROW_NUMBER() function. The PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC will basically assign the value 1 to the row with each player's most recent sort date. The outer filters out all rows except those with the 1 ranking.

  • You can also assign rankings with the RANK() and DENSE_RANK() analytical functions, but if a player has a tie for the most recent date then all the tied dates will be ranked #1 and you'll get multiple rows for that player. In situations like this where you only want one row per player, use ROW_NUMBER() instead.

Put it all together and you get this:

SELECT MatchID, PlayerID, TeamID, MatchDte, StartDate FROM (
  SELECT
    pi.MatchID,
    pi.PlayerID,
    t.TeamID,
    m.MatchDate,
    pf.StartDate,
    ROW_NUMBER() OVER (PARTITION BY pi.PlayerID ORDER BY pf.StartDate DESC) AS StartDateRank
  FROM Plays_In pi
  INNER JOIN Match m ON pi.MatchID = m.MatchID
  INNER JOIN Plays_A pa ON m.MatchID = pa.MatchID
  INNER JOIN Team t ON pa.TeamID = t.TeamID
  INNER JOIN Plays_For pf ON pf.PlayerID = pi.PlayerID AND pf.TeamID = t.TeamID
  WHERE pi.MatchID = '&match_id'
  AND m.MatchDate >= pf.StartDate
)
WHERE StartDateRank = 1
ORDER BY MatchID, PlayerID

One final point: based on the WHERE pi.MatchID = '&match_id' it looks like you may be using PHP as your front end and the mysql functions to do the query. If so, please look into mysqli or PDO instead, as they'll protect you from SQL Injection. The mysql functions (which are officially deprecated) will not.


Addendum: More information about ROW_NUMBER, with many thanks to @AndriyM.

With ROW_NUMBER, if a player has more than one row with the most recent date, only one of the rows will be assigned as ROW_NUMBER = 1, and that row will be picked more or less randomly. Here's an example, where a player's most recent date is 5/1/2013 and the player has three rows with this date:

pi.MatchID  pi.PlayerID  pf.StartDate
----------  -----------  ------------
      100         1000   05/01/2013 <-- could be ROW_NUMBER = 1
      101         1000   04/29/2013
      105         1000   05/01/2013 <-- could be ROW_NUMBER = 1
      102         1000   05/01/2013 <-- could be ROW_NUMBER = 1 
      107         1000   04/18/2013

Note that only one of the rows above will be assigned ROW_NUMBER = 1, and it can be any of them. Oracle will decide, not you.

If this uncertainty is a problem, order by additional columns to get a clear winner. For this example, the highest pi.MatchID will be used to determine the "true" ROW_NUMBER = 1:

-- replace `ROW_NUMBER...` in the query above with this:
    ROW_NUMBER() OVER (
      PARTITION BY pi.PlayerID
      ORDER BY pf.StartDate DESC, pi.MatchID DESC) AS StartDateRank

Now if there's a tie for the highest pf.StartDate, Oracle looks for the highest pi.MatchID within the subset of rows with the highest pf.StartDate. As it turns out, only one row satisfies this condition:

pi.MatchID  pi.PlayerID  pf.StartDate
----------  -----------  ------------
      100         1000   05/01/2013
      101         1000   04/29/2013
      105         1000   05/01/2013 <-- is ROW_NUMBER = 1: highest MatchID for
                                     -- most recent StartDate (5/1/2013)
      102         1000   05/01/2013
      107         1000   04/18/2013 <-- not considered: has the highest MatchID but isn't
                                     -- in the subset with the most recent StartDate
like image 153
Ed Gibbs Avatar answered Sep 18 '22 17:09

Ed Gibbs


You could use the rank() function.

SELECT * FROM (
    SELECT pi.MatchID, pi.PlayerID, t.TeamID, m.MatchDate, pf.StartDate,
     rank() over (partition by pi.PlayerID order by m.MatchDate desc, rowid) as RNK
    FROM Plays_In pi, Match m, Plays_A pa, Team t, Plays_For pf, Made_Up_Of muo, Season s
    WHERE pi.MatchID = m.MatchID
    AND m.MatchID = pa.MatchID
    AND pa.TeamID = t.TeamID
    AND pf.PlayerID = pi.PlayerID
    AND pf.TeamID = t.TeamID
    AND muo.MatchID = pi.MatchID
    AND muo.SeasonID = s.SeasonID
    AND pi.MatchID = '&match_id'
    AND m.MatchDate >= pf.StartDate
) WHERE RNK = 1
ORDER BY MatchID ASC, PlayerID ASC, StartDate DESC;
like image 36
tylert Avatar answered Sep 20 '22 17:09

tylert