Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MSSQL Select Top 10 winning scores, including Ties and at least one from each category

I got some help finding the top 10 scores, including tied entries, using the following statement

select T.EntryID, T.CategoryID, T.Score
from (
   select EntryID, CategoryID, Score,
          dense_rank() over(order by Score) as rn
   from YourTable
 ) T
where T.rn <= 10

(thanks [mikael-eriksson]: https://stackoverflow.com/users/569436/mikael-eriksson)

[question]: MSSQL Selecting top 10 but include columns with duplicate values Here is sample data:

EntryID CategoryID  Score
3036    1           85
3159    1           85
3039    1           84
3146    1           83
3225    1           82
3045    1           82
3047    1           80
3048    1           80
3049    1           80
3193    1           80
3098    1           80
3025    1           72
3082    1           70
3167    1           70
3122    1           67
3220    1           65
3080    1           65
3168    1           64
______________________
Total Entries >= 18

There is a requirement that there be at least one entry from each category in the top 10 (or top whatever it may be i.e. top 100), in this case there are 3 Categories.

Now all I need to do is to include at least one entry per category in the top 10. i.e. if all the top 10 scores are from Category 1, and there are 3 categories, I need to drop the 2 lowest scores from Category 1 and include the highest score entry for both Category 2 and 3.

As you can see from the results all the entries are from Category 1, so I need to drop EntryID's 3220, 3080 and 3168 from the resultset as they are the lowest scored, and include the highest scoring entry in Category 2 as well as the highest scoring entry in Category 3 so that the result looks something like this:

EntryID CategoryID  Score
3036    1           85
3159    1           85
3039    1           84
3146    1           83
3225    1           82
3045    1           82
3047    1           80
3048    1           80
3049    1           80
3193    1           80
3098    1           80
3025    1           72
3082    1           70
3167    1           70
3122    1           67
3019    3           60
3800    2           54
______________________
Total Entries >= 17

Same thing goes for the following scenario, let's look at the top 5 instead of top 10 to make it a little easier on the eye, as you can see in this example the Top 5 scores exclude entries from Category 2

EntryID CategoryID  Score
3036    1           85
3159    1           85
3039    1           84
3146    1           83
3225    1           82
3045    1           82
3019    3           60
______________________
Total Entries >= 7

In this case entries 3225 and 3045 needs to drop as they are the lowest scored entries (3047 needs to be included as even though it's the lowest scored entry I need an entry from all categories in the result) and I need to include the highest scored entry from Category 2, I would expect something like this:

EntryID CategoryID  Score
3036    1           85
3159    1           85
3039    1           84
3146    1           83
3019    3           60
3800    2           54
______________________
Total Entries >= 6

And then there may be the scenario where there may not be an entry into a specific category, let say for example no Category 2 entries so the result should still have the top 5 as with the original result set for the top 5 above (included below as reference)

EntryID CategoryID  Score
3036    1           85
3159    1           85
3039    1           84
3146    1           83
3225    1           82
3045    1           82
3019    3           60
______________________
Total Entries >= 7

Please excuse if I'm repeating myself, I'm just trying to make it clear to understand ;)

I really Appreciate the help!

like image 624
Ianc22 Avatar asked Aug 19 '12 01:08

Ianc22


1 Answers

As I can see it, you need to rank your rows in a more sophisticated way, so that entries that are the top ones in every category are included regardless of their values, and entries that are not the top ones are included according to their overall rankings.

What I'm about to suggest may not be the most efficient solution, but it should work and, if nothing else can, might inspire someone else to come up with something better:

WITH ranked1 AS (
  SELECT
    *,
    RankByCategory = DENSE_RANK() OVER (
      PARTITION BY CategoryID
      ORDER BY Score DESC
    )
  FROM YourTable
),
ranked2 AS (
  SELECT
    *,
    FinalRank = DENSE_RANK() OVER (
      ORDER BY
        CASE RankByCategory WHEN 1 THEN 1 ELSE 2 END,
        Score DESC
    )
  FROM ranked1
)
SELECT
  EntryID,
  CategoryID,
  Score
FROM ranked2
WHERE FinalRank <= @top_n
;

The first CTE is ranking rows by categories, thus letting us find out which entries become the top ones in their respective categories. The next step (second CTE) is about obtaining global rankings, this time taking into account whether an entry is the top one in its category or not. The category top values receive lower rankings and thus are ensured to be included in the final results. (Of course, you need to make sure that the number of categories is not greater than the number of distinct values you want to receive in the output.)

Here's a live example at SQL Fiddle to play with.

like image 111
Andriy M Avatar answered Sep 17 '22 22:09

Andriy M