Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

checking whether at least one value in a dataframe row is bigger than a given row-specific threshold

Tags:

r

apply

this is my small reproducible example for the dataset I am working on:

set.seed(123)
dat <- as.data.frame( cbind(a=1+round(runif(5), 2), b=round(rnorm(5), 2), high_cutoff=round(1+rnorm(5), 1)) )

The dataframe is:

     a     b   high_cutoff
   1.29 -1.69         2.3
   1.79  1.24        -0.7
   1.41 -0.11         2.7
   1.88 -0.12         1.5
   1.94  0.18         3.5

I am trying to check whether, by row, there is at least one value in the first two columns that is higher that the correpondig threshold in the third column (say that I want to store a 1 if any of the two values is higher that the cutoff).

In the example, what I expect is to find is:

   higher_than_cutoff         
0
1
0 
1
0

I've been trying to use the following (wrong) code, and some variations of it, without much success:

higher_than_cutoff <- apply( dat[, c("a", "b")], 1, function(x) any(x > dat[, "high_cutoff"]) )

Can you please give some advice on how to proceed? Any help is highly appreciated

like image 504
Stefano Lombardi Avatar asked Dec 08 '22 02:12

Stefano Lombardi


2 Answers

You could try

 as.integer(do.call(pmax,dat[-3]) > dat[,3])
 #[1] 0 1 0 1 0

Or

((max.col(dat))!=3)+0L
  #[1] 0 1 0 1 0
like image 21
akrun Avatar answered Jan 22 '23 11:01

akrun


Here's a possible vectorized solution (if you fine with just TRUE/FALSE you can remove the + at the beginning)

+(rowSums(dat[-3L] > dat[, 3L]) > 0)
## [1] 0 1 0 1 0

If you insist on apply, you can do something like

apply(dat, 1, function(x) +(any(x[-3] > x[3])))
## [1] 0 1 0 1 0
like image 51
David Arenburg Avatar answered Jan 22 '23 10:01

David Arenburg