Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server NTILE - Same value in different quartile

I have a scenario where i'm splitting a number of results into quartilies using the SQL Server NTILE function below. The goal is to have an as equal number of rows in each class

case NTILE(4) over (order by t2.TotalStd) 
   when 1 then 'A' when 2 then 'B' when 3 then 'C' else 'D' end as Class

The result table is shown below and there is a (9,9,8,8) split between the 4 class groups A,B,C and D.

There are two results which cause me an issue, both rows have a same total std value of 30 but are assigned to different quartiles.

8   30  A
2   30  B

I'm wondering is there a way to ensure that rows with the same value are assigned to the same quartile? Can i group or partition by another column to get this behaviour?

Pos TotalStd    class
1   16  A
2   23  A
3   21  A
4   29  A
5   25  A
6   26  A
7   28  A
8   30  A
9   29  A
1   31  B
2   30  B
3   32  B
4   32  B
5   34  B
6   32  B
7   34  B
8   32  B
9   33  B
1   36  C
2   35  C
3   35  C
4   35  C
5   40  C
6   38  C
7   41  C
8   43  C
1   43  D
2   48  D
3   45  D
4   47  D
5   44  D
6   48  D
7   46  D
8   57  D
like image 438
emeraldjava Avatar asked Feb 17 '12 16:02

emeraldjava


People also ask

What will happen if there are 53 rows and you try Ntile 5?

For example if the total number of rows is 53 and the number of groups is five, the first three groups will have 11 rows and the two remaining groups will have 10 rows each. If on the other hand the total number of rows is divisible by the number of groups, the rows will be evenly distributed among the groups.

How does Ntile work in SQL?

The NTILE window function divides ordered rows in the partition into the specified number of ranked groups of as equal size as possible and returns the group that a given row falls into.

Is Ntile a ranking function?

The NTILE function is an OLAP ranking function that classifies the rows in each partition into N ranked categories, called tiles, where each category includes an approximately equal number of rows.


2 Answers

You will need to re create the Ntile function, using the rank function. The rank function gives the same rank for rows with the same value. The value later 'jumps' to the next rank as if you used row_number. We can use this behavior to mimic the Ntile function, forcing it to give the same Ntile value to rows with the same value. However - this will cause the Ntile partitions to be with a different size. See the example below for the new Ntile using 4 bins:

declare @data table ( x int )

insert @data values 
(1),(2),
(2),(3),
(3),(4),
(4),(5)

select  
    x,
    1+(rank() over (order by x)-1) * 4 / count(1) over (partition by (select 1)) as new_ntile
from @data

Results:

x   new_ntile
---------------
1   1
2   1
2   1
3   2
3   2
4   3
4   3
5   4
like image 164
Roee Anuar Avatar answered Nov 04 '22 07:11

Roee Anuar


Not sure what you're expecting to happen here, really. SQL Server has divided up the data into 4 groups of as-equal-size-as-possible, as you asked. What do you want to happen? Have a look at this example:

declare @data table ( x int )

insert @data values 
(1),(2),
(2),(3),
(3),(4),
(4),(5)

select  
    x,
    NTILE(4) over (order by x) as ntile
from @data

Results:

x           ntile
----------- ----------
1           1
2           1
2           2
3           2
3           3
4           3
4           4
5           4

Now every ntile group shares a value with the one(s) next to it! But what else should it do?

like image 35
AakashM Avatar answered Nov 04 '22 09:11

AakashM