Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to partition when ranking on a particular column?

All:

I have a data frame like the follow.I know I can do a global rank order like this:

dt <- data.frame(
    ID = c('A1','A2','A4','A2','A1','A4','A3','A2','A1','A3'),
    Value = c(4,3,1,3,4,6,6,1,8,4)
);
> dt
   ID Value
1  A1     4
2  A2     3
3  A4     1
4  A2     3
5  A1     4
6  A4     6
7  A3     6
8  A2     1
9  A1     8
10 A3     4
dt$Order <- rank(dt$Value,ties.method= "first")
> dt
   ID Value Order
1  A1     4     5
2  A2     3     3
3  A4     1     1
4  A2     3     4
5  A1     4     6
6  A4     6     8
7  A3     6     9
8  A2     1     2
9  A1     8    10
10 A3     4     7

But how can I set a rank order for a particular ID instead of a global rank order. How can I get this done? In T-SQL, we can get this done as the following syntax:

RANK() OVER ( [ < partition_by_clause > ] < order_by_clause > )

Any idea?

like image 203
RobinMin Avatar asked Apr 01 '12 03:04

RobinMin


People also ask

Can we use partition by with rank in SQL?

To partition rows and rank them by their position within the partition, use the RANK() function with the PARTITION BY clause. SQL's RANK() function allows us to add a record's position within the result set or within each partition. In our example, we rank rows within a partition.

What is rank () over partition by?

The RANK() function is a window function that assigns a rank to each row in the partition of a result set. The rank of a row is determined by one plus the number of ranks that come before it. RANK() OVER ( PARTITION BY <expr1>[{,<expr2>...}]

Can rank be used with group by?

we can use rank function and group by in the same query set but all the columns should be contained in either aggregate function or the Group by clause.


1 Answers

Many options.

Using ddply from the plyr package:

library(plyr)
ddply(dt,.(ID),transform,Order = rank(Value,ties.method = "first"))
   ID Value Order
1  A1     4     1
2  A1     4     2
3  A1     8     3
4  A2     3     2
5  A2     3     3
6  A2     1     1
7  A3     6     2
8  A3     4     1
9  A4     1     1
10 A4     6     2

Or if performance is an issue (i.e. very large data) using the data.table package:

library(data.table)
DT <- data.table(dt,key = "ID")
DT[,transform(.SD,Order = rank(Value,ties.method = "first")),by = ID]
      ID Value Order
 [1,] A1     4     1
 [2,] A1     4     2
 [3,] A1     8     3
 [4,] A2     3     2
 [5,] A2     3     3
 [6,] A2     1     1
 [7,] A4     1     1
 [8,] A4     6     2
 [9,] A3     6     2
[10,] A3     4     1

or in all its gory detail a base R solution using split lapply do.call and rbind:

do.call(rbind,lapply(split(dt,dt$ID),transform,
              Order = rank(Value,ties.method = "first")))
like image 77
joran Avatar answered Sep 30 '22 19:09

joran