Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most efficient way to return ranks of a vector within levels of a factor, as a vector having the same order/length as the original vector?

Tags:

r

With one more requirement - that the resulting vector is in the same order as the original.

I have a very basic function that percentiles a vector, and works just the way I want it to do:

ptile <- function(x) {
  p <- (rank(x) - 1)/(length(which(!is.na(x))) - 1)
  p[p > 1] <- NA
  p 
}

data <- c(1, 2, 3, 100, 200, 300)

For example, ptile(data) generates:

[1] 0.0 0.2 0.4 0.6 0.8 1.0

What I'd really like to be able to do is use this same function (ptile) and have it work within levels of a factor. So suppose I have a "factor" f as follows:

f <- as.factor(c("a", "a", "b", "a", "b", "b"))

I'd like to be able to transform "data" into a vector that tells me, for each observation, what its corresponding percentile is relative to other observations within its same level, like this:

0.0 0.5 0.0 1.0 0.5 1.0

As a shot in the dark, I tried:

tapply(data,f,ptile)

and see that it does, in fact, succeed at doing the ranking/percentiling, but does so in a way that I have no idea which observations match up to their indices in the original vector:

[1] a a b a b b
Levels: a b
> tapply(data,f,ptile)
$a
[1] 0.0 0.5 1.0

$b
[1] 0.0 0.5 1.0

This matters because the actual data I'm working with can have 1000-3000 observations (stocks) and 10-55 levels (things like sectors, groupings by other stock characteristics, etc), and I need the resulting vector to be in the same order as the way it went in, in order for everything to line up, row by row in my matrix.

Is there some "apply" variant that would do what I am seeking? Or a few quick lines that would do the trick? I've written this functionality in C# and F# with a lot more lines of code, but had figured that in R there must be some really direct, elegant solution. Is there?

Thanks in advance!

like image 372
user297400 Avatar asked Dec 07 '22 23:12

user297400


1 Answers

The ave function is very useful. The main gotcha is to remember that you always need to name the function with FUN=:

 dt <- data.frame(data, f)
 dt$rank <-  with(dt, ave(data, list(f), FUN=rank))
     dt
    #---
      data f rank
    1    1 a    1
    2    2 a    2
    3    3 b    1
    4  100 a    3
    5  200 b    2
    6  300 b    3

Edit: I thought I was answering the question in the title but have been asked to include the code that uses the "ptile" function:

> dt$ptile <-  with(dt, ave(data, list(f), FUN=ptile))
> dt
  data f rank ptile
1    1 a    1   0.0
2    2 a    2   0.5
3    3 b    1   0.0
4  100 a    3   1.0
5  200 b    2   0.5
6  300 b    3   1.0
like image 161
IRTFM Avatar answered Jan 04 '23 23:01

IRTFM