Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Element wise comparison in R

Tags:

arrays

loops

r

I'm attempting to write a for loop that will compare values between two individuals, but not the same individual. The following data frame contains values for five subjects:

           Value1 
Subject1   0      
Subject2   1      
Subject3   5      
Subject4   6      
Subject5   8      

I've written a double loop that creates a 'Value2' variable based on the following criteria:

  1. If the subject has a larger Value1, then the result is +1.
  2. If the subject has an equal Value1, then the result is 0.
  3. If the subject has a smaller Value1, then the result is -1.

For example, Subject 1's Value1 is smaller than the other four subjects; this should result in -4. So far the loop I've written works for the first subject but fails to iterate to the second subject.

Value2<-0
i = 0
w = 0

for(i in 1:length(Value1)){
    for(j in 1:length(Value1)){
        if(i != j){
            Value1[i] = w
            if(w > Value1[j]){
                Value2[i] = Value2[i] + 1
            }    
            if(w < Value1[j]){
                Value2[i] = Value2[i] - 1
            } 
            if(w == Value1[j]){
                Value2[i] = Value2[i] + 0
            }
        }
    }
}
like image 739
statsguyz Avatar asked Dec 06 '22 09:12

statsguyz


1 Answers

If I'm understanding the problem correctly, this should give you what you want

x <- c(0, 1, 5, 6, 8)
colSums(outer(x, x, '<')) - colSums(outer(x, x, '>'))
# [1] -4 -2  0  2  4

Or

-colSums(sign(outer(x, x, '-')))
# [1] -4 -2  0  2  4

Edit: If your vector is large (or even if it isn't, really) use d.b.'s rank method instead. The outer function will create an NxN matrix where N is the length of x. For example, when x is sample(1e5) outer will attempt to create a matrix >30Gb in size! This means most people's laptops in 2019 don't even have enough memory for this method to work on large vectors. With this same x, the method using rank provided by d.b. returns the result almost instantly.

Benchmark for vector of size 1000

x <- sample(1000)
microbenchmark(
outer_diff = colSums(-sign(outer(x, x, '-'))),
outer_gtlt = colSums(outer(x, x, '<')) - colSums(outer(x, x, '>')),
rank = {r <- rank(x); 2*(r - mean(r))}
)
# Unit: microseconds
#        expr      min         lq       mean    median        uq        max neval cld
#  outer_diff 15930.26 16872.4175 20946.2980 18030.776 25346.677  38668.324   100   b
#  outer_gtlt 14168.21 15120.4165 28970.7731 16698.264 23857.651 352390.298   100   b
#        rank   111.18   141.5385   170.8885   177.026   188.513    282.257   100  a 
like image 191
IceCreamToucan Avatar answered Dec 08 '22 00:12

IceCreamToucan