Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the column number and value the of second highest value in a row

Tags:

dataframe

r

plyr

I am trying to write some code which identifies the greatest two values for each row and provides their column number and value.

df = data.frame( car = c (2,1,1,1,0), bus = c (0,2,0,1,0),
                 walk = c (0,3,2,0,0), bike = c(0,4,0,0,1))

I've managed to get it to do this for the maximum value using the max and max.col functions.

df$max = max.col(df,ties.method="first")
df$val = apply(df[ ,1:4], 1, max)

As far as I know there are no equivalent functions for the second highest value so doing this has made things a little trickier. Using this code provides the second highest value but (importantly) not in situations with ties. Also it looks risky.

sec.fun <- function (x) {
  max( x[x!=max(x)] )
}

df$val2 <- apply(df[ ,1:4], 1, sec.fun)

Ideally the solution would not involve removing any original data and could be used to find the third, fourth... highest value but neither of these are essential requirements.

like image 776
BuckyOH Avatar asked Apr 24 '12 11:04

BuckyOH


1 Answers

try this:

# a function that returns the position of n-th largest
maxn <- function(n) function(x) order(x, decreasing = TRUE)[n]

this is a closure, so you can use like this:

> # position of the largest
> apply(df, 1, maxn(1))
[1] 1 4 3 1 4
> # position of the 2nd largest
> apply(df, 1, maxn(2))
[1] 2 3 1 2 1
> 
> # value of the largest
> apply(df, 1, function(x)x[maxn(1)(x)])
[1] 2 4 2 1 1
> # value of the 2nd largest
> apply(df, 1, function(x)x[maxn(2)(x)])
[1] 0 3 1 1 0

Updated

Why using closure here?

One reason is that you can define a function such as:

max2 <- maxn(2)
max3 <- maxn(3)

then, use it

> apply(df, 1, max2)
[1] 2 3 1 2 1
> apply(df, 1, max3)
[1] 3 2 2 3 2

I'm not sure if the advantage is obvious, but I like this way, since this is more functional-ish way.

like image 147
kohske Avatar answered Oct 05 '22 12:10

kohske