Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rank values in r datatable grouped by another variable

Tags:

r

data.table

rank

I would like to use datatable's frank function to rank the date column by id. However, my rankings only seem to take into consideration the date column and not the id corresponding to it. I also receive 6 of these warnings that I'm not sure about:

1..... 6: In [.data.table(dups, , :=(rank, frank(dups, date, ties.method = "average")), : RHS 1 is length 10 (greater than the size (1) of group 6). The last 9 element(s) will be discarded.

dups <- data.table (id = c('11', '11', '11', '22','22',
  '88', '99','44','44', '55'),
  date = mdy(c("1-01-2016", "1-02-2016", "1-02-2016","2-01-2016", 
  "2-02-2016")))

so.sample <- dups[, rank := frank(dups, date, ties.method = "average"), by = id]

For example, id = 11 and date = 2016-01-01 should rank 1 instead of 1.5, because there is only one id and date with that combination.

thx for help

like image 742
user3067851 Avatar asked May 18 '16 16:05

user3067851


People also ask

How do I rank a table in R?

The ranking of a variable in an R data frame can be done by using rank function. For example, if we have a data frame df that contains column x then rank of values in x can be found as rank(df$x).

How does rank work in R?

rank returns a vector with the "rank" of each value. the number in the first position is the 9th lowest. order returns the indices that would put the initial vector x in order. The 27th value of x is the lowest, so 27 is the first element of order(x) - and if you look at rank(x) , the 27th element is 1 .


1 Answers

It works just fine with both, 'rank' and 'frank'. Maybe your date variable was not formatted correctly. Here is the code:

dt1 <- data.table (id = c('11', '11', '11', '22','22',
                      '88', '99','44','44', '55'),
               date = as.Date(c("01-01-2016", 
                                "01-02-2016", 
                                "01-02-2016",
                                "02-01-2016", 
                                "02-02-2016"),
                              format = "%m-%d-%Y"))
setkey(dt1, date)
setkey(dt1, id)
dt1

    id       date
 1: 11 2016-01-01
 2: 11 2016-01-02
 3: 11 2016-01-02
 4: 22 2016-02-01
 5: 22 2016-02-02
 6: 44 2016-01-02
 7: 44 2016-02-01
 8: 55 2016-02-02
 9: 88 2016-01-01
10: 99 2016-01-02

dt1[, rank := frank(date),
    by = list(id)]
dt1

    id       date  rank
 1: 11 2016-01-01   1.0
 2: 11 2016-01-02   2.5
 3: 11 2016-01-02   2.5
 4: 22 2016-02-01   1.0
 5: 22 2016-02-02   2.0
 6: 44 2016-01-02   1.0
 7: 44 2016-02-01   2.0
 8: 55 2016-02-02   1.0
 9: 88 2016-01-01   1.0
10: 99 2016-01-02   1.0

Additionally, if you just want to enumerate your records, using .N can be quite helpful:

dt1[, Visit := 1:.N,
    by = list(id)]
dt1

    id       date rank Visit
 1: 11 2016-01-01  1.0     1
 2: 11 2016-01-02  2.5     2
 3: 11 2016-01-02  2.5     3
 4: 22 2016-02-01  1.0     1
 5: 22 2016-02-02  2.0     2
 6: 44 2016-01-02  1.0     1
 7: 44 2016-02-01  2.0     2
 8: 55 2016-02-02  1.0     1
 9: 88 2016-01-01  1.0     1
10: 99 2016-01-02  1.0     1

I hope this helps.

like image 187
Davit Sargsyan Avatar answered Oct 15 '22 16:10

Davit Sargsyan